Hello, I am implementing a 128 point real Radix-2 fft, data and coefficient widths are 16 bit. I am synthezising it for use in an FPGA. However, it is taking a very long time to synthesize. (approx 3 days using Leonardo on a 2 GHz machine with 512 MByte RAM) I am using a 20K1000 Altera FPGA. The ram required by the fft will be internal to the FPGA Will this design take up all the space on the device. From past experience, can someone give me an indication of what area of the device the fft will occupy. Surely if it takes up most of the device, then it will be too big, as I have other features to implement in the FPGA also !! Thank you PJ
fft size in fpga
Started by ●September 14, 2003
Reply by ●September 15, 20032003-09-15
It very heavily depends on your implementation. The fact that it is radix2 and not radix 4 or higher tells me already that your implementation is not a very efficient one. A radix 4 kernel is very little added complexity over a radix 2 kernel and cuts the processing time and area considerable. Depending on your speed requirements and your design prowess, you can certainly fit a 128 point real-only FFT in a much smaller part than a 1M gate part. As your speed requirements decrease, you can take advantage of iterative or shared hardware to reduce the gate count considerably. I did a 4096 point design in a Xilinx XCV1000 that does the complex transform in 68 us (the floorplan and a brief description are on the gallery page on my website). That one only occupies about 40% of the FPGA and includes some floating point stuff as well as windowing multipliers and some other goodies. It takes intimate knowledge of the FPGA structure and of algorithms to achieve that level of density (the customer called the design "a work of art"), but even a novice can achieve a density/performance level of half that design with some carefully thought out design. PJ wrote:> Hello, > > I am implementing a 128 point real Radix-2 fft, data and coefficient > widths are 16 bit. > I am synthezising it for use in an FPGA. However, it is taking a very > long time to synthesize. (approx 3 days using Leonardo on a 2 GHz > machine with 512 MByte RAM) I am using a 20K1000 Altera FPGA. The ram > required by the fft will be internal to the FPGA > > Will this design take up all the space on the device. From past > experience, can someone give me an indication of what area of the > device the fft will occupy. > > Surely if it takes up most of the device, then it will be too big, as > I have other features to implement in the FPGA also !! > > Thank you > PJ-- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759
Reply by ●September 16, 20032003-09-16
> PJ wrote: > > > Hello, > > > > I am implementing a 128 point real Radix-2 fft, data and coefficient > > widths are 16 bit. > > I am synthezising it for use in an FPGA. However, it is taking a very > > long time to synthesize. (approx 3 days using Leonardo on a 2 GHz > > machine with 512 MByte RAM) I am using a 20K1000 Altera FPGA. The ram > > required by the fft will be internal to the FPGAHiya, I have noted with some synthesizers that if it cannot infer your RAM "correctly" it will try to build it out of registers (as opposed to the dedicated RAM on the FPGA). This can end up taking a _very_ long time for only reasonably large RAMs. You might consider synthesizing each module in your design separately until you find the part that causes the problem. If it isn't already structured thus it might be wise to do so as this may also help. Later, Andyman.
Reply by ●September 19, 20032003-09-19
Several of the textbooks on FFTs such as Smith & Smith "Handbook of realtime FFTs" describe higher radix FFTs. I believe the Xilinx FFT core data sheet goes into a fair amount of detail on a radix 4 kernel. The easiest way to do hardware sharing is to use one kernel and run the data through it in multiple passes. You'll need to change the twiddle factors (which are really a phase rotation) and the data ordering. The twiddles can be handled by changing the input to a multiplier using a table. The data reordering is doable by permuting the bits of your address counter. DA is really a technique for doing multiply-accumulate, not just a multiply. There are many ways to do the multiplications, for example if speed is not an issue you can use a scaling accumulator for a fairly compact multiplier. See the page about multiplication on my website. Flip-flop count is not the whole story, you also have to account for the combinatorial logic between flip-flops. You could do a totally combinatorial multiplier with no flip=flops, for example (although it would be quite slow if it is reasonably sized) PJ wrote:> Hi Ray/Andyman, > > Thanks for replying. > > Ray .... > Is there a good explanation on radix-4 ffts ? If I can mod the code > quickly, I will certainly do it. As regarding shared hardware, I > thought about that when implementing my design, but I really couldn't > figure out a way to do it. Are you talking about reducing the number > of stages ? Did you use RAM to provide the hardware sharing ? > > I have let the synthesis tools generate the multiplier. I just used > the '*' function. This is one area I think I could make good space > saving. I have seen your articles about multipliers on your website. > Would a DA type multiplier save me much space ? > > What is the most efficient multiplier in terms of flop count, and is > there VHDL code or verilog code available. I'll gladly write it > myself, if I can get some pointers. I can trade speed for gate count. > My problem now is the lack of time I have to get this project > completed. > > Andyman > > I have implemented RAM in my design, and thankfully it synthesis to > RAM fine. However in saying that...that is for very small designs. > This certainly might be a problem. I am using leonardo spectrum, and > will give synplify a try next. > If you found any solutions...I would be glad to hear. > > Thank you both for your time and your replies. > > Good luck > PJ > > Ray Andraka <ray@andraka.com> wrote in message news:<3F664CBB.86459E77@andraka.com>... > > It very heavily depends on your implementation. The fact that it is > > radix2 and > > not radix 4 or higher tells me already that your implementation is not a > > very > > efficient one. A radix 4 kernel is very little added complexity over a > > radix 2 > > kernel and cuts the processing time and area considerable. Depending on > > your > > speed requirements and your design prowess, you can certainly fit a 128 > > point > > real-only FFT in a much smaller part than a 1M gate part. As your speed > > requirements decrease, you can take advantage of iterative or shared > > hardware > > to reduce the gate count considerably. I did a 4096 point design in a > > Xilinx > > XCV1000 that does the complex transform in 68 us (the floorplan and a > > brief > > description are on the gallery page on my website). That one only > > occupies > > about 40% of the FPGA and includes some floating point stuff as well as > > windowing multipliers and some other goodies. It takes intimate > > knowledge > > of the FPGA structure and of algorithms to achieve that level of density > > (the customer called the design "a work of art"), but even a novice can > > achieve a density/performance level of half that design with some > > carefully > > thought out design. > > > > PJ wrote: > > > > > Hello, > > > > > > I am implementing a 128 point real Radix-2 fft, data and coefficient > > > widths are 16 bit. > > > I am synthezising it for use in an FPGA. However, it is taking a very > > > long time to synthesize. (approx 3 days using Leonardo on a 2 GHz > > > machine with 512 MByte RAM) I am using a 20K1000 Altera FPGA. The ram > > > required by the fft will be internal to the FPGA > > > > > > Will this design take up all the space on the device. From past > > > experience, can someone give me an indication of what area of the > > > device the fft will occupy. > > > > > > Surely if it takes up most of the device, then it will be too big, as > > > I have other features to implement in the FPGA also !! > > > > > > Thank you > > > PJ > > > > -- > > --Ray Andraka, P.E. > > President, the Andraka Consulting Group, Inc. > > 401/884-7930 Fax 401/884-7950 > > email ray@andraka.com > > http://www.andraka.com > > > > "They that give up essential liberty to obtain a little > > temporary safety deserve neither liberty nor safety." > > -Benjamin Franklin, 1759-- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759