FPGARelated.com
Forums

Estimating number of FPGAs needed for an application

Started by Unknown March 12, 2007
Hi all

I'm absolutely new to FPGAs, in fact my work is much more related with
the SW than with the HW, so I need to solve a problem that ideally I
was not targeted to.

The issue is this: I have to estimate (roughly) the number of FPGAs
needed to support a typical signal processing algorithm, steps are as
follows, always in single-precision:

1.16k complex samples FFT
2. 16k complex vector multiplication
3. 16k complex samples IFFT
4. 16k complex vector multiplication
5. 16 k complex vector sum

The idea is to know how many FPGAs will cover this kind of processing
in a given time, to compare with different types of processors. Por
the later, it is really easy just counting number of operations in
GFLOPs, but with hardware devices I am getting a lot of trouble, since
I don't have a clear understanding on what should I count.

Please, give me a hand!

Ruben

On Mar 12, 1:35 pm, rbbla...@gmail.com wrote:
[...]
> > The issue is this: I have to estimate (roughly) the number of FPGAs > needed to support a typical signal processing algorithm, steps are as > follows, always in single-precision: > > 1.16k complex samples FFT > 2. 16k complex vector multiplication > 3. 16k complex samples IFFT > 4. 16k complex vector multiplication > 5. 16 k complex vector sum > > The idea is to know how many FPGAs will cover this kind of processing > in a given time, to compare with different types of processors.
With hardware implementation you will need to specify the time you want this algorithm being processed in. It will make a difference in the implementation. The faster you want to go, the more you need to implement in parallel and the more resources you will need. For the FFT you can request a design fit from here: http://www.dilloneng.com/ip/fft/fftipfit_cpt But that is a specific design fit for their FFT. So you might find other vendors that get you a different fit. Cheers, Guenter
There is additional information needed for this evaluation:
- how often do you need a result (throughput and latency)?
- what is the data type (integer? float? precision?)

Unlike CPUs FPGAs have no native datatypes. For cryptographic
applications you might want to
run an FFT on vectors of single bits. For DNA matching you might have
2-bit or 4-bit data types.
For DSP 18-bit or 36-bit integers are a common choice for Xilinx
FPGAs.

The algorithm that you describe implemented serially on 1-bit data
would use 1% of a small FPGA
and run for several hundred thousand clock cycles.

Alternatively on a large FPGA OTOH you can perform a few hundred 18-
bit x 18-bit Multiplications per cycle.

Kolja Sulimma



On 12 Mrz., 13:35, rbbla...@gmail.com wrote:
> Hi all > > I'm absolutely new to FPGAs, in fact my work is much more related with > the SW than with the HW, so I need to solve a problem that ideally I > was not targeted to. > > The issue is this: I have to estimate (roughly) the number of FPGAs > needed to support a typical signal processing algorithm, steps are as > follows, always in single-precision: > > 1.16k complex samples FFT > 2. 16k complex vector multiplication > 3. 16k complex samples IFFT > 4. 16k complex vector multiplication > 5. 16 k complex vector sum > > The idea is to know how many FPGAs will cover this kind of processing > in a given time, to compare with different types of processors. Por > the later, it is really easy just counting number of operations in > GFLOPs, but with hardware devices I am getting a lot of trouble, since > I don't have a clear understanding on what should I count. > > Please, give me a hand! > > Ruben
rbblasco@gmail.com wrote:

> Hi all > > I'm absolutely new to FPGAs, in fact my work is much more related with > the SW than with the HW, so I need to solve a problem that ideally I > was not targeted to. > > The issue is this: I have to estimate (roughly) the number of FPGAs > needed to support a typical signal processing algorithm, steps are as > follows, always in single-precision: > > 1.16k complex samples FFT > 2. 16k complex vector multiplication > 3. 16k complex samples IFFT > 4. 16k complex vector multiplication > 5. 16 k complex vector sum > > The idea is to know how many FPGAs will cover this kind of processing > in a given time, to compare with different types of processors. Por > the later, it is really easy just counting number of operations in > GFLOPs, but with hardware devices I am getting a lot of trouble, since > I don't have a clear understanding on what should I count. > > Please, give me a hand! > > Ruben >
You left out a key piece of information: how fast do you need to compute these 5 steps? A processor that can do all 5 can fit on a single FPGA provided there is a reasonable amount of time between data sets, and that there is enough memory available to buffer the input (if needed), store intermediate results, and buffer the output. The wide swath ocean altimeter design featured in the gallery on my website ( http://www.andraka.com/wsoa.htm ), for example does everything on your list, in the same order and more in under 250usec for a 4K point data set using very old (original virtex) technology, which has comparatively little on-chip memory and no embedded multipliers. About 2/3rd of the area is dedicated to storage buffers using SRL16s (the large cyan block in the middle rignt, the magenta/green block below it, and the yellow/green blocks at the bottom are all buffers). The FPGA size is small, features are sparse and speed is slow by today's standards. Implementation size depends heavily on the FFT implementation of course. My FFT kernel has the smallest size-performance footprint, so using others will result in a bigger design for a given speed.
comp.arch.fpga wrote:
(snip)

> For DNA matching you might have 2-bit or 4-bit data types.
For dynamic programming algorithms, the favorite way to do DNA matching, it is usual to do 16 bit fixed point arithmetic. -- glen
rbblasco@gmail.com wrote:

> I'm absolutely new to FPGAs, in fact my work is much more related with > the SW than with the HW, so I need to solve a problem that ideally I > was not targeted to.
> The issue is this: I have to estimate (roughly) the number of FPGAs > needed to support a typical signal processing algorithm, steps are as > follows, always in single-precision:
> 1.16k complex samples FFT > 2. 16k complex vector multiplication > 3. 16k complex samples IFFT > 4. 16k complex vector multiplication > 5. 16 k complex vector sum
> The idea is to know how many FPGAs will cover this kind of processing > in a given time, to compare with different types of processors. Por > the later, it is really easy just counting number of operations in > GFLOPs, but with hardware devices I am getting a lot of trouble, since > I don't have a clear understanding on what should I count.
First, floating point tends to be a lot bigger on FPGAs than fixed point, especially floating point addition. If you can get away with fixed point, even if the actual width is somewhat larger, it is probably worth doing. Also, you can't just count 'FPGA', but you have to take into account the size of the different FPGAs, even from the same product family. I like systolic array processors, which usually work well for this type of problem. The thought process for hardware implementations, especially good pipelined ones, is somewhat different than for software implementations. Usually hardware implementations are used when software isn't fast enough, so you need to know how fast it has to go. There is a tradeoff between time and size, but it isn't linear enough to quote without more details. -- glen
With everyone else's previously mentioned comments in mind as well, I
would recommend downloading Xilinx's webpack tool.  Open their "Core
Generator" software.  Run the FFT core from there.  You can enter in
things like processing frequency, sample frequency, etc...  and it
will give you a resource utilization.  You can also pull this
information from the datasheet for their radix-2 fft core.

I wish you the best of luck - but you may want to recommend that your
boss consult a hardware engineer.  With all do respect to software
engineers (I can't write decent C code to save my life) despite what
management likes to believe, FPGA design is hardware design, not
software design.  Without have a good deal of background experience in
digital design, you're going to find it's difficult to make this kind
of estimate accurately.  Again, nothing against software folks, it's
just a different set of training and experience that's required.



On Mar 12, 3:18 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> rbbla...@gmail.com wrote: > > I'm absolutely new to FPGAs, in fact my work is much more related with > > the SW than with the HW, so I need to solve a problem that ideally I > > was not targeted to. > > The issue is this: I have to estimate (roughly) the number of FPGAs > > needed to support a typical signal processing algorithm, steps are as > > follows, always in single-precision: > > 1.16k complex samples FFT > > 2. 16k complex vector multiplication > > 3. 16k complex samples IFFT > > 4. 16k complex vector multiplication > > 5. 16 k complex vector sum > > The idea is to know how many FPGAs will cover this kind of processing > > in a given time, to compare with different types of processors. Por > > the later, it is really easy just counting number of operations in > > GFLOPs, but with hardware devices I am getting a lot of trouble, since > > I don't have a clear understanding on what should I count. > > First, floating point tends to be a lot bigger on FPGAs than fixed > point, especially floating point addition. If you can get away with > fixed point, even if the actual width is somewhat larger, it is probably > worth doing. > > Also, you can't just count 'FPGA', but you have to take into account > the size of the different FPGAs, even from the same product family. > > I like systolic array processors, which usually work well for this > type of problem. The thought process for hardware implementations, > especially good pipelined ones, is somewhat different than for software > implementations. Usually hardware implementations are used when > software isn't fast enough, so you need to know how fast it has to go. > > There is a tradeoff between time and size, but it isn't linear enough > to quote without more details. > > -- glen- Hide quoted text - > > - Show quoted text -