You should model your intentions in C or other language with both a
floating point and integer version. If you use doubles and compare the
final result with your proposed integer model, you will see
dramatically different response, errors tend to accumulate unevenly.
Even on a fairly small FFT used by a DCT in wavelet project, we got
cought on this one even with a Phd on board. The logic design had been
almost done without error analysis, when the anomalous integer results
showed error peaks of several bits in magnitudes, the shifting had to
be corrected to meet spec. Luckily in ASIC this can be reduced to an
additional 2 gates of rounding penalty logic in the overall design. In
FPGA this would not be the case, not sure if any include rounded
division of 2^n.

In order to preserve best S/N ratio you need to divide each stage by
proper division of sqr 2 or 2 typically every other rank. In 2's
complement math, shift down by n bits is not the same as / 2^n although
with larger n, it gets closer. Div 2 requires that -ve nos scale
towards 0 the same as +ve nos do. This means sign sensitive rounding
that considers the bits falling out with the msb. Once this is done the
integer results can be a very good approximation of full FP design.
>From this I think you can conclude that FFTs generally need 1/2b extra
precision for each doubling in size to keep the same S/N ratio.

Most all the good DSP texts cover error analysis as a routine part of
FFT design, I still use the old blue book from 70s Rabiner I think.

johnjakson

satpreetsingh@gmail.com wrote:

First off, you can't discard the high bits out of the multiplier.  The 
twiddle factors, as you observed, are sines and cosines with nominal 
values between -1 and 1, however they are represented by moving the 
position of the implied radix point to the left end of the 16 bit value. 
  The product, therefore has it's radix point also moved to the left by 
16 bits.  If 16 bits precision is sufficient (it won't be for larger 
FFTs), then you need to round off the lsbs of the product.  You'll need 
to round symmetrically to avoid introducing a bias, as the FFT is 
sensitive to rounding bias.

16 bits precision is going to limit the size of the FFT as well as the 
precision of the results.  The amount of limiting depends in part on the 
nature of the input, but I wouldn't expect to get much larger than about 
256 points without getting substantial artifacts from the limited 
precision.  You should model your application with a bit-accurate model 
to verify the noise due to internal truncation is not going to be  a 
problem for your application before expending a lot of effort in the design.

I'm worried that you might not have fully considered the tradeoffs
you're proposing to make. Dynamic range and precision are very
different. What do you mean by: "one should expect the same no. of bits
in the output as in the input"? If you're imagining that there are
going to be 16bits that are all zeros that you can happily chop off,
then you will be sorely disappointed by what you see in the hardware.
Both multiplicands could well use all 16 bits for precision. As long as
they're correctly aligned, you can multiply the 2 numbers, keeping only
the top 16 bits of your result. The magnitudes will be fine, but you
will have lost a lot of precision. You'll have to watch out for
overflow when you're summing the results of your multiplications.
Again, when you do the additions, you'll have to discard the bottom
bits of your results, in order to have 16 bit outputs for your
butterfly. This sacrifice of precision (which is perfectly reasonable)
will mean that there is a fundamental limit to the number of stages
(and therefore the largest FFT) you can have. The error will accumulate
and will at some point cause the results to breakdown. You could work
this out mathematically, but I'm sure someone on this board must have
done a similar thing as you've done and will be able to tell us how
large an FFT you can perform using such a technique.

All the above is given with the following caveats:
1. I'm new to this sort of thing.
2. I'm a notorious idiot.

so if anyone can yay or nay the above, it would be helpful :)

Q: I'm making a FFT block in hardware (on an FPGA) and I need some
advice on multipliers:

1. I have made a simple (Fixed point arithmetic) Radix-2
Decimation-in-time butterfly block which takes in two 16-bit complex
inputs and another 16-bit input twiddle factor input and produces two
complex outputs. Now with any standard multiplier circuit, multiplying
N bits by N bits gives 2N bit products. However, as twiddle factor
terms are actually Cos()/Sin() terms i.e. lesser than or equal to 1,
one should expect the same no. of bits in the output as in the input.
So what I've done is that I've made a multiplier circuit which simply
neglects (or masks off) N bits from the 2N bit result. This way I can
make a single butterfly which can be cascaded to give 2^n sized
(larger) FFTs. I just wanted to confirm if I'm on the right track...

Am I forgetting something in my implementation or is this sufficient ?

2. Newer FPGAs have multipliers/arithmetic-circuits built into them. Is
there anyway to exploit these internal features without wastage, such
as by masking the higher bits ? 

Thank you,
Satpreet 
India