FPGARelated.com
Forums

Polynomial Function ...

Started by Kappa September 1, 2009
Hi,

I have one analog signal input. I want to apply at analog signal a
transfer function of the type :

y = a0 + a1 * x + a2 * x^2 + a3 * x^3 + a4 * x^4 + a5 * x^5 + a6 * x^6
+ a7 * x^7 + a8 * x^8 + a9 * x^9

Polynomial semplification :

y = a0 + x * (a1 + x * (a2 + x * (a3 + x * (a4 + x * (a5 + x (a6 + x *
(a7 + x * (a8 + x * a9))))))))

There are specific techniques and/or optimization for building this
polynomial on FPGA ?

Kappa.
On Tue, 1 Sep 2009 00:38:54 -0700 (PDT), Kappa wrote:

>I have one analog signal input. I want to apply at analog signal a >transfer function of the type : > >y = a0 + a1 * x + a2 * x^2 + a3 * x^3 + a4 * x^4 + a5 * x^5 + a6 * x^6 >+ a7 * x^7 + a8 * x^8 + a9 * x^9 > >Polynomial semplification : > >y = a0 + x * (a1 + x * (a2 + x * (a3 + x * (a4 + x * (a5 + x (a6 + x * >(a7 + x * (a8 + x * a9)))))))) > >There are specific techniques and/or optimization for building this >polynomial on FPGA ?
The second form can very easily be pipelined. For streaming data such as your analog signal, and on an FPGA, pipelining is usually almost free. However, digitized analog input inevitably has a limited number of bits and it may be better to use a lookup table. You don't need the full resolution in your table; instead, store the function and its first difference, and use linear interpolation. Your function then becomes one lookup, one multiply and one add - instead of your nine multiply-add operations. -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
Hi Jonathan Bromley,

> The second form can very easily be pipelined. =A0For streaming > data such as your analog signal, and on an FPGA, pipelining > is usually almost free.
In fact, the realization is not difficult.
> However, digitized analog input inevitably has a limited > number of bits and it may be better to use a lookup table. > You don't need the full resolution in your table; instead, > store the function and its first difference, and use linear > interpolation. =A0Your function then becomes one lookup, one > multiply and one add - instead of your nine multiply-add > operations.
But this technique preserves the resolution of the classical technique, the old one ? Consider input of signed 18 bit with 15 binary point (in/out of function) and signed 32 bit with 15 binary point "a" coefficient. Thanks. Kappa.
Kappa wrote:
> Hi Jonathan Bromley, > >> The second form can very easily be pipelined. For streaming >> data such as your analog signal, and on an FPGA, pipelining >> is usually almost free. > > In fact, the realization is not difficult. > >> However, digitized analog input inevitably has a limited >> number of bits and it may be better to use a lookup table. >> You don't need the full resolution in your table; instead, >> store the function and its first difference, and use linear >> interpolation. Your function then becomes one lookup, one >> multiply and one add - instead of your nine multiply-add >> operations. > > But this technique preserves the resolution of the classical > technique, the old one ? > > Consider input of signed 18 bit with 15 binary point (in/out of > function) and signed 32 bit with 15 binary point "a" coefficient. >
The trouble with a large polynomial like this is that the higher power parts are very sensitive to the resolution. For polynomials expressed in this form, you'll want a great deal more binary point bits for the later a_n coefficients, or to use floating point (which makes things much more complicated). With care and enough bits, and different placement of the binary points depending on the values of a_n, you might make it work well enough. A lookup table with linear interpolation much easier to make accurately than a large polynomial like this. You could also compromise and use a lookup table and quadratic or cubic interpolation to get more accuracy from a smaller table. Another idea, if your maths is up to it, is to use Chebyshev polynomials instead of powers of x. You would need more multipliers (18 for a ninth power polynomial, if I've done my sums right) and more adders, but all your partial calculations and coefficients are on the same scale making it much easier. http://en.wikipedia.org/wiki/Chebyshev_polynomials
On Tue, 1 Sep 2009 01:48:15 -0700 (PDT), Kappa wrote:

>> You don't need the full resolution in your table; instead, >> store the function and its first difference, and use linear >> interpolation. �Your function then becomes one lookup, one >> multiply and one add - instead of your nine multiply-add >> operations. > >But this technique preserves the resolution of the classical >technique, the old one ?
Well.... as the saying goes, "go figure". Linear interpolation (or any interpolation) can always give sufficiently good resolution, if the table points are sufficiently closely spaced. The details obviously depend on the values of your higher-order coefficients.
>Consider input of signed 18 bit with 15 binary point (in/out of >function) and signed 32 bit with 15 binary point "a" coefficient.
With an 18-bit input, you have only 128K different input values. It would be easy to write a small program to calculate all the result values, and then compare them with the values obtained by interpolation (or any other method). You can then decide whether the errors are small enough to be insignificant. Consider this, though. Suppose you make a 1024-entry table. Use the top 10 bits of your 18-bit data to address into the table. Use the lower 8 bits for linear interpolation. Now, the error introduced by ignoring the x^2 term is less than one part in 1,000,000. Assuming the x^2 coefficient (a2) is no more than 4 times bigger than the linear coefficient (a0), that means the error will disappear in your 18-bit result. The same argument applies, even more strongly, for the x^3 and higher terms. You will see significant errors in your 18-bit result only when you have a2, a3, ... coefficients that are much larger than a0, a1. I'm intrigued to know why you are applying a 9th order polynomial transfer function to your signal. -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
On Tue, 1 Sep 2009 00:38:54 -0700 (PDT), Kappa <secureasm@gmail.com> wrote:

>Hi, > >I have one analog signal input. I want to apply at analog signal a >transfer function of the type : > >y = a0 + a1 * x + a2 * x^2 + a3 * x^3 + a4 * x^4 + a5 * x^5 + a6 * x^6 >+ a7 * x^7 + a8 * x^8 + a9 * x^9 > >Polynomial semplification : > >y = a0 + x * (a1 + x * (a2 + x * (a3 + x * (a4 + x * (a5 + x (a6 + x * >(a7 + x * (a8 + x * a9)))))))) > >There are specific techniques and/or optimization for building this >polynomial on FPGA ? > >Kappa.
There are any number of ways to implement it; as Jonathan said your pipelined implementation is easy and cheap (though you will inevitably lose resolution through the successive multiplications; see below) His suggestion of interpolation is good, though I tend to add a quadratic term to guarantee accuracy (this requires three multiplications rather than one; but is accurate enough for single precision floating point; i.e. 24-bit or better accuracy, on square root, reciprocal, etc) Or if you prefer a direct implementation, consider the range of coefficient values for your higher order terms; they are typically small. Consider the rounding error imposed by truncating the product of x and the largest coefficient; you can reduce the resolution of the smaller multiplications until they introduce a similar (preferably still smaller) error with economic benefits (reduced resources) Or if latency is important, note that the slowest path, a9 * x^9 = (a9 * x) * ((x*x) * (x*x)) * (ditto) which takes 4 cycles to compute; plus a fifth to add it to the (parallel computed) sum of the faster paths. (Implementation issues may require additional cycles to move data between multipliers) Only you can determine if the loss of resolution due to e.g. truncating multiplier outputs,or economising on word widths is acceptable; i.e. within your error budget. This can be exhaustively tested in simulation (for a single variable input as here) by a testbench implementing comparison with an exact solution, and logging the magnitude and spread of errors. For a single-precision FP square root unit (quadratic interpolator) I tested all 2^24 input values in simulation in about an hour; you can afford to do this after any major design change if accuracy is important; normally I only re-tested the most sensitive region. But for prototyping the algorithm and testing e.g. the accuracy of an interpolated LUT, I suggest starting with a simple spreadsheet! You can truncate values to 16 (or 19) bits and quickly investigate tradeoffs to get a pretty good idea of what you are doing before moving to VHDL. - Brian
Hi David Brown,

> The trouble with a large polynomial like this is that the higher power > parts are very sensitive to the resolution. =A0For polynomials expressed > in this form, you'll want a great deal more binary point bits for the > later a_n coefficients, or to use floating point (which makes things > much more complicated). =A0With care and enough bits, and different > placement of the binary points depending on the values of a_n, you might > make it work well enough.
In fact, the problem is precisely the higher coefficients. Coefficients range from 0.000xxx to xxxxxx.xxx, the problem is to optimize the multiplier. I was thinking about going down to 7 or 5 coefficients.
> A lookup table with linear interpolation much easier to make accurately > than a large polynomial like this. =A0You could also compromise and use a > lookup table and quadratic or cubic interpolation to get more accuracy > from a smaller table.
I never facing a problem like this until now. so I do not know where to begin to implement this technique.
> Another idea, if your maths is up to it, is to use Chebyshev polynomials > instead of powers of x. =A0You would need more multipliers (18 for a nint=
h
> power polynomial, if I've done my sums right) and more adders, but all > your partial calculations and coefficients are on the same scale making > it much easier. > > http://en.wikipedia.org/wiki/Chebyshev_polynomials
No. I can not. I have to save as much resources as possible. Thanks. Kappa.
Hi Jonathan,

> With an 18-bit input, you have only 128K different input values. > It would be easy to write a small program to calculate all the > result values, and then compare them with the values obtained by > interpolation (or any other method). =A0You can then decide whether > the errors are small enough to be insignificant.
Was what I was thinking. But to implement a memory 128K x 18 bits requires too many resources.
> Consider this, though. =A0Suppose you make a 1024-entry table. =A0Use > the top 10 bits of your 18-bit data to address into the table. > Use the lower 8 bits for linear interpolation. =A0
This technique is not very clear to me ... can get more details?
> I'm intrigued to know why you are applying a 9th order > polynomial transfer function to your signal.
I'm realizing Digital Predistortion. With 9 polynomial coefficients i'm doing the fitting function for the amplifier. It would be sufficient even 5 or 7 I was experiencing with 9. Thanks. Kappa.
Kappa wrote:
> Hi David Brown, > >> The trouble with a large polynomial like this is that the higher power >> parts are very sensitive to the resolution. For polynomials expressed >> in this form, you'll want a great deal more binary point bits for the >> later a_n coefficients, or to use floating point (which makes things >> much more complicated). With care and enough bits, and different >> placement of the binary points depending on the values of a_n, you might >> make it work well enough. > > In fact, the problem is precisely the higher coefficients. > Coefficients range from 0.000xxx to xxxxxx.xxx, the problem is to > optimize the multiplier. I was thinking about going down to 7 or 5 > coefficients. >
Are you able to post the function and its coefficients, or perhaps the original function that the polynomial approximates? Where did the polynomial come from in the first place? If it is an approximation made to fit existing data, then it would be better to use the raw data as the basis for your lookup table.
>> A lookup table with linear interpolation much easier to make accurately >> than a large polynomial like this. You could also compromise and use a >> lookup table and quadratic or cubic interpolation to get more accuracy >> from a smaller table. > > I never facing a problem like this until now. so I do not know where > to begin to implement this technique. >
Wikipedia is a reasonable starting point: http://en.wikipedia.org/wiki/Cubic_spline http://en.wikipedia.org/wiki/Cubic_Hermite_spline
>> Another idea, if your maths is up to it, is to use Chebyshev polynomials >> instead of powers of x. You would need more multipliers (18 for a ninth >> power polynomial, if I've done my sums right) and more adders, but all >> your partial calculations and coefficients are on the same scale making >> it much easier. >> >> http://en.wikipedia.org/wiki/Chebyshev_polynomials > > No. I can not. I have to save as much resources as possible. >
You might need more multipliers for Chebyshev, but they will (should!) be narrower than you would need for your Horner's method polynomial with a given accuracy. Maybe you'll end up with less resources overall.
Hi Brain,

> There are any number of ways to implement it; as Jonathan said your pipelined > implementation is easy and cheap (though you will inevitably lose resolution > through the successive multiplications; see below)
I already have been realized and tested. It's works very well, better than what I expected, but it very expensive resource.
> His suggestion of interpolation is good, though I tend to add a quadratic term > to guarantee accuracy (this requires three multiplications rather than one; but > is accurate enough for single precision floating point; i.e. 24-bit or better > accuracy, on square root, reciprocal, etc)
It would be perfect, the problem is that I do not know the technique.
> Or if you prefer a direct implementation, consider the range of coefficient > values for your higher order terms; they are typically small. Consider the > rounding error imposed by truncating the product of x and the largest > coefficient; you can reduce the resolution of the smaller multiplications until > they introduce a similar (preferably still smaller) error with economic benefits > (reduced resources)
I'd rather save resources instead of direct implementation. It must be mentioned that the coefficients will be changed at runtime, will not be fixed. I must be able to move without risk of overflow or underflow.
> Or if latency is important, note that the slowest path, > a9 * x^9 = (a9 * x) * ((x*x) * (x*x)) * (ditto) > which takes 4 cycles to compute; plus a fifth to add it to the (parallel > computed) sum of the faster paths. (Implementation issues may require additional > cycles to move data between multipliers)
This is not a problem. Now direct form working until 125 MHz.
> Only you can determine if the loss of resolution due to e.g. truncating > multiplier outputs,or economising on word widths is acceptable; i.e. within your > error budget.
Here I agree, for now I am more interested in a model that saves resources.
> But for prototyping the algorithm and testing e.g. the accuracy of an > interpolated LUT, I suggest starting with a simple spreadsheet! You can truncate > values to 16 (or 19) bits and quickly investigate tradeoffs to get a pretty good > idea of what you are doing before moving to VHDL.
I am always rushed and I can not wait to see the result after I realize that a whale never enters the backside of a mouse ... :-) ... Thanks. Kappa.