FPGARelated.com
Forums

extracting D from 1 / D*D

Started by Bert_Paris August 17, 2011
Hi Folks,
Incredibly busy summer here, so before burning my brain cells, Googling 
or -worst- digging in my very dusty math courses, I submit this 
question to the DSP experts who usually float around, hoping some 
aren't at the beaches ;-)

How would you extract D from a X = K / (D * D) value (16 bits) ?
- in a small FPGA indeed, which has no embedded mult (but a mult could 
be synthesized)
- please don't answer to synthesize sqrt(1/x) !
- I have lots of clock cycles available to compensate the lack of FPGA 
resources.
- I would like to avoid a Piece-Wise Linear estimator if possible 
(though it would allow for coding sensor non-linearities).

In the past, I remember implementing an sqrt operator using a suite 
(the suite didn't converge as rapidly as it might, but the 
implementation was easy), so a similar technique would be nice.
But I keep an open mind : any idea is most welcome :-)

Thanks,
Bert


Bert_Paris <do_not_spam@me.com> wrote:

(snip)
> How would you extract D from a X = K / (D * D) value (16 bits) ? > - in a small FPGA indeed, which has no embedded mult (but a mult could > be synthesized) > - please don't answer to synthesize sqrt(1/x) ! > - I have lots of clock cycles available to compensate the lack of FPGA > resources. > - I would like to avoid a Piece-Wise Linear estimator if possible > (though it would allow for coding sensor non-linearities).
There is an iterative sqrt algorithm that only does subtraction. For only 16 bits, it shouldn't take a lot of LUTs, and not so many cycles, either. I haven't thought about coding it in logic lately, though. What about the K? -- glen
glen herrmannsfeldt said :

> There is an iterative sqrt algorithm that only does subtraction. > For only 16 bits, it shouldn't take a lot of LUTs, and not > so many cycles, either. I haven't thought about coding it > in logic lately, though. What about the K? > > -- glen
As mentioned, I have no problem regarding sqrt, but it is 1 / Sqr(D) that I have to transform. The solution I had in mind before posting was to use a non-restore division (for 1/X) followed by the iterative sqrt. But maybe some one has a better idea. Bert
On 17/08/2011 13:11, Bert_Paris wrote:
> Hi Folks, > Incredibly busy summer here, so before burning my brain cells, Googling > or -worst- digging in my very dusty math courses, I submit this question > to the DSP experts who usually float around, hoping some aren't at the > beaches ;-) > > How would you extract D from a X = K / (D * D) value (16 bits) ? > - in a small FPGA indeed, which has no embedded mult (but a mult could > be synthesized) > - please don't answer to synthesize sqrt(1/x) ! > - I have lots of clock cycles available to compensate the lack of FPGA > resources. > - I would like to avoid a Piece-Wise Linear estimator if possible > (though it would allow for coding sensor non-linearities).
Have you figured out what accuracy you need? It will make a /big/ difference to the ease of implementation. Implementing a linear interpolation means having a table of coefficients. Have you got space to allow lookup in a table of some sort? You can reduce the coefficients a bit by first left-shifting X by two bits repeatedly until you have a 1 in either of the two MSB's. Once you have calculated D, right-shift it by 1 bit for each 2 bits you left-shifted X. This will save you from having to have high-precision lookups for small X.
> > In the past, I remember implementing an sqrt operator using a suite (the > suite didn't converge as rapidly as it might, but the implementation was > easy), so a similar technique would be nice. > But I keep an open mind : any idea is most welcome :-) >
You don't just have to do a square root - you also have to do a reciprocal. These are both relatively hard to implement accurately (unlike multiplication, which is fairly easy - and quite small if you have lots of clock cycles and can do it bit for bit). If you are looking for a series of estimators, Newton-Raphson generally gives fast convergence. The sequence for your problem would be: D_{n+1} = (3/2).D_n - (X/2K).D_n^3 Unfortunately, that will need a number of multiplies per cycle. You would also need to test how well it worked with limited bit precision.
Thanks for your feedback !

David Brown a formul&#4294967295; la demande :
> On 17/08/2011 13:11, Bert_Paris wrote: >> >> How would you extract D from a X = K / (D * D) value (16 bits) ?
>> - in a small FPGA indeed, which has no embedded mult (but a mult could >> be synthesized) >> - please don't answer to synthesize sqrt(1/x) ! >> - I have lots of clock cycles available to compensate the lack of FPGA >> resources. >> - I would like to avoid a Piece-Wise Linear estimator if possible >> (though it would allow for coding sensor non-linearities). > > Have you figured out what accuracy you need? It will make a /big/ difference > to the ease of implementation.
10 bits looks good enough.
> Implementing a linear interpolation means having a table of coefficients. > Have you got space to allow lookup in a table of some sort?
Not ideal in the context (uninitialized RAMs) :-( but that's doable.
> You can reduce the coefficients a bit by first left-shifting X by two bits > repeatedly until you have a 1 in either of the two MSB's. Once you have > calculated D, right-shift it by 1 bit for each 2 bits you left-shifted X. > This will save you from having to have high-precision lookups for small X.
If I calculate the sqrt first, I may not need this.
> >> In the past, I remember implementing an sqrt operator using a suite (the >> suite didn't converge as rapidly as it might, but the implementation was >> easy), so a similar technique would be nice. >> But I keep an open mind : any idea is most welcome :-) >> > > You don't just have to do a square root - you also have to do a reciprocal.
Absolutely !
> These are both relatively hard to implement accurately (unlike > multiplication, which is fairly easy - and quite small if you have lots of > clock cycles and can do it bit for bit). > > If you are looking for a series of estimators, Newton-Raphson generally gives > fast convergence. The sequence for your problem would be: > > D_{n+1} = (3/2).D_n - (X/2K).D_n^3 > > Unfortunately, that will need a number of multiplies per cycle. You would > also need to test how well it worked with limited bit precision.
While googling, I found a "magical" estimator based on manipulating an IEEE floating point as an integer (which they refine with an iteration of the NR above). Not useful here I guess. After some thinking I will probably try this : - calculate sqrt in format 8.2 or 8.3 - use a 1024 entries LUT for the reciprocal or (due to lack of ROM) : - use a non-restore division 2**N / X Thanks !
Bert_Paris <do_not_spam@me.com> wrote:

(snip, I wrote)
>> There is an iterative sqrt algorithm that only does subtraction. >> For only 16 bits, it shouldn't take a lot of LUTs, and not >> so many cycles, either. I haven't thought about coding it >> in logic lately, though. What about the K?
> As mentioned, I have no problem regarding sqrt, but it is 1 / Sqr(D) > that I have to transform. > The solution I had in mind before posting was to use a non-restore > division (for 1/X) followed by the iterative sqrt. But maybe some one > has a better idea.
In Newton-Raphson, reciprocal square root is easier than just square root, but that might not be true for the subtract based version. Non-restore divide is pretty small, or even restore divide, if you have the clock cycles to spare. How many cycles do you have? -- glen
Of course you'll have to deal with the special case 1/sqrt(0) = 
infinity.  Leaving that aside:

>> You can reduce the coefficients a bit by first left-shifting X by two >> bits repeatedly until you have a 1 in either of the two MSB's. Once >> you have calculated D, right-shift it by 1 bit for each 2 bits you >> left-shifted X. This will save you from having to have high-precision >> lookups for small X.
This is very good advice, it makes the problem *much* easier. This approach works because 1/sqrt(4D) = 1/sqrt(D) >> 1 The HW required is a leading-1 detector that selects whether to left shift the input 16 bits by 0/2/4/8/10/12/14 bits and later to right shift the output by 0/1/2/3/4/5/6/7 bits. This approach means that you now only have to deal with the input range 16384...65535. Why is that an advantage ? Because the function 1/sqrt(D) is highly nonlinear over the range 1...65535 but only mildly nonlinear over the range 16384...65535. Linear interpolation will work *much* better over the reduced range. Some numbers to back this up: derivative of 1/sqrt(D) is -1/2Dsqrt(D) derivative @ 1 = -0.5 derivative @ 16384 = -2.38e-7 derivative @ 65535 = -2.98e-8 The /gradient/ of the curve changes by a factor of roughly 17,000,000 times between 1 and 65535 but only by a factor of 12.5 between 16384 and 65535. In other words, the curve is *much* more linear over the reduced range. Applying piecewise linear interpolation over the reduced range should work well. As an example, using just 1 linear segment will give you around a 20% worst case error (at the middle of the range ie 40960). Using 64 linear segments will give you about 12 bit accuracy. Hope that helps. Stephen Ecob Silicon On Inspiration Sydney Australia www.sioi.com.au
Steve <theecobs@gmail.com> wrote:

(snip, someone wrote)
>>> You can reduce the coefficients a bit by first left-shifting X by two >>> bits repeatedly until you have a 1 in either of the two MSB's. Once >>> you have calculated D, right-shift it by 1 bit for each 2 bits you >>> left-shifted X. This will save you from having to have high-precision >>> lookups for small X.
> This is very good advice, it makes the problem *much* easier.
> This approach works because 1/sqrt(4D) = 1/sqrt(D) >> 1
Usually this would be done with a barrel shifter, but they are not very LUT efficient in many FPGAs. If you aren't so restricted in cycles, though, you can use an ordinary shift register. I am not sure sure how much that saves in cycles or cells. -- glen
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
> Steve <theecobs@gmail.com> wrote:
> (snip, someone wrote) > >>> You can reduce the coefficients a bit by first left-shifting X by two > >>> bits repeatedly until you have a 1 in either of the two MSB's. Once > >>> you have calculated D, right-shift it by 1 bit for each 2 bits you > >>> left-shifted X. This will save you from having to have high-precision > >>> lookups for small X.
> > This is very good advice, it makes the problem *much* easier.
> > This approach works because 1/sqrt(4D) = 1/sqrt(D) >> 1
> Usually this would be done with a barrel shifter, but they are > not very LUT efficient in many FPGAs. If you aren't so restricted > in cycles, though, you can use an ordinary shift register.
You can use a Multiplier as barrel shifter too (Xilinx xapp195) -- Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt --------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------
>> Usually this would be done with a barrel shifter, but they are >> not very LUT efficient in many FPGAs. If you aren't so restricted >> in cycles, though, you can use an ordinary shift register. > > You can use a Multiplier as barrel shifter too (Xilinx xapp195)
Given that there are plenty of clock cycles available and not much HW, I'd spend the first 8 clock cycles looking at the two MSBs and left shifting two positions when those MSBs are 00. The detection of the zero input special case can also be folded into this part very efficiently. Stephen Ecob Silicon On Inspiration Sydney Australia www.sioi.com.au