Reply by Kevin Neilson October 18, 20182018-10-18
On Wednesday, October 17, 2018 at 9:26:38 AM UTC-6, Benjamin Couillard wrote:
> Le mardi 16 octobre 2018 22:36:56 UTC-4, Kevin Neilson a écrit : > > > Yes I think you're right it could work for an atan or atan 2. You could implemented a divider for atan 2 (y/x), a sign look-up (to get the whole 360 degrees), a BRAM + taylor interpolation. > > > > > > For atan, you'd simply skip the divider and sign look-up part. > > > > > > On the other hand, Xilinx and Altera offer plug-and-play Cordic cores for atan/atan2. > > > > It's probably better to multiply y/x by x to get a normalized ratio rather than use a divider, which requires a lot of resources. > > > > I recalled that I had to implement the atan2 function once for a QAM carrier/symbol recovery circuit. I didn't need great precision, so I split one quadrant into a grid and put the angle of the center of each grid square into a 2-dimensional lookup ROM. Then I could put X,Y into the ROM and get the coarse angle (which was then adjusted for the quadrant) and could use that for carrier recovery. > > One case that comes to mind : you have 2 quadrature signals > > x = A(t) * cos (wt + some phase) + noise_x > y = A(t) * sin (wt + some phase) + noise_y > > Atan2(y, x) = wt + some phase. The variations of A(t) (as long as they are slow-ish) will cancel each other out. > > You can filter or average wt + some phase to extract the phase. Or derivate "wt + some phase" then filter to get the filtered frequency. > > So multiplying "y/x by x" would not make much sense in this case.
No, multiplying by x doesn't make sense. Perhaps using a ROM for 1/x and a multiplier would be better than a full divider.
Reply by Benjamin Couillard October 17, 20182018-10-17
Le mardi 16 octobre 2018 22:36:56 UTC-4, Kevin Neilson a écrit :
> > Yes I think you're right it could work for an atan or atan 2. You could implemented a divider for atan 2 (y/x), a sign look-up (to get the whole 360 degrees), a BRAM + taylor interpolation. > > > > For atan, you'd simply skip the divider and sign look-up part. > > > > On the other hand, Xilinx and Altera offer plug-and-play Cordic cores for atan/atan2. > > It's probably better to multiply y/x by x to get a normalized ratio rather than use a divider, which requires a lot of resources. > > I recalled that I had to implement the atan2 function once for a QAM carrier/symbol recovery circuit. I didn't need great precision, so I split one quadrant into a grid and put the angle of the center of each grid square into a 2-dimensional lookup ROM. Then I could put X,Y into the ROM and get the coarse angle (which was then adjusted for the quadrant) and could use that for carrier recovery.
One case that comes to mind : you have 2 quadrature signals x = A(t) * cos (wt + some phase) + noise_x y = A(t) * sin (wt + some phase) + noise_y Atan2(y, x) = wt + some phase. The variations of A(t) (as long as they are slow-ish) will cancel each other out. You can filter or average wt + some phase to extract the phase. Or derivate "wt + some phase" then filter to get the filtered frequency. So multiplying "y/x by x" would not make much sense in this case.
Reply by Kevin Neilson October 16, 20182018-10-16
> Yes I think you're right it could work for an atan or atan 2. You could implemented a divider for atan 2 (y/x), a sign look-up (to get the whole 360 degrees), a BRAM + taylor interpolation. > > For atan, you'd simply skip the divider and sign look-up part. > > On the other hand, Xilinx and Altera offer plug-and-play Cordic cores for atan/atan2.
It's probably better to multiply y/x by x to get a normalized ratio rather than use a divider, which requires a lot of resources. I recalled that I had to implement the atan2 function once for a QAM carrier/symbol recovery circuit. I didn't need great precision, so I split one quadrant into a grid and put the angle of the center of each grid square into a 2-dimensional lookup ROM. Then I could put X,Y into the ROM and get the coarse angle (which was then adjusted for the quadrant) and could use that for carrier recovery.
Reply by Benjamin Couillard October 12, 20182018-10-12
Le vendredi 12 octobre 2018 00:42:18 UTC-4, Kevin Neilson a =C3=A9crit=C2=
=A0:
> On Thursday, October 11, 2018 at 7:37:37 AM UTC-6, Benjamin Couillard wro=
te:
>=20 > > Cordic is useful to compute high-precision atan2. Otherwise for 2 16-b=
it inputs, you'd need a ram with 2^32 addresses (maybe 16 times less if you= take advantage of the symmetries).
>=20 > Sorry; I didn't see that you were talking about arctan. It's not quite a=
s easy as sin/cos, but there is still the question as to whether a Farrow-t= ype architecture using a coarse lookup and Taylor interpolations would be b= etter than a CORDIC, and I am guessing that with the BRAMs and multipliers = in present-day FPGAs, the answer would be yes. Yes I think you're right it could work for an atan or atan 2. You could im= plemented a divider for atan 2 (y/x), a sign look-up (to get the whole 360 = degrees), a BRAM + taylor interpolation.=20 For atan, you'd simply skip the divider and sign look-up part.=20 On the other hand, Xilinx and Altera offer plug-and-play Cordic cores for a= tan/atan2.
Reply by Kevin Neilson October 12, 20182018-10-12
On Thursday, October 11, 2018 at 7:37:37 AM UTC-6, Benjamin Couillard wrote:

> Cordic is useful to compute high-precision atan2. Otherwise for 2 16-bit inputs, you'd need a ram with 2^32 addresses (maybe 16 times less if you take advantage of the symmetries).
Sorry; I didn't see that you were talking about arctan. It's not quite as easy as sin/cos, but there is still the question as to whether a Farrow-type architecture using a coarse lookup and Taylor interpolations would be better than a CORDIC, and I am guessing that with the BRAMs and multipliers in present-day FPGAs, the answer would be yes.
Reply by Kevin Neilson October 11, 20182018-10-11
> Cordic is useful to compute high-precision atan2. Otherwise for 2 16-bit inputs, you'd need a ram with 2^32 addresses (maybe 16 times less if you take advantage of the symmetries).
I think you missed the part about the Taylor interpolation.
Reply by Benjamin Couillard October 11, 20182018-10-11
Le mercredi 10 octobre 2018 21:52:06 UTC-4, Kevin Neilson a écrit :
> > What do you think? > > That's good stuff. I wonder why you think using a BRAM is bad? It's good to use the hard cores in an FPGA instead of the fabric--it's less power and deterministic routing times. > > Is the CORDIC still advantageous in a modern FPGA? The last time I needed to find sine, I used a coarse BRAM lookup that output sine on one port and cos on another. Those were used as derivatives for a 2nd-order Taylor. Two multipliers (more hard cores) are used (using Horner's Rule) for the 1st and 2nd-order interpolations. I don't remember how many digits of accuracy this yields, but the latency is low.
Cordic is useful to compute high-precision atan2. Otherwise for 2 16-bit inputs, you'd need a ram with 2^32 addresses (maybe 16 times less if you take advantage of the symmetries).
Reply by Kevin Neilson October 10, 20182018-10-10
> What do you think?
That's good stuff. I wonder why you think using a BRAM is bad? It's good to use the hard cores in an FPGA instead of the fabric--it's less power and deterministic routing times. Is the CORDIC still advantageous in a modern FPGA? The last time I needed to find sine, I used a coarse BRAM lookup that output sine on one port and cos on another. Those were used as derivatives for a 2nd-order Taylor. Two multipliers (more hard cores) are used (using Horner's Rule) for the 1st and 2nd-order interpolations. I don't remember how many digits of accuracy this yields, but the latency is low.
Reply by Mike Field September 10, 20182018-09-10
On Wednesday, 5 September 2018 23:43:28 UTC+12, Gene Filatov  wrote:
> > I don't know the literature well, but I think it would be cool if you > actually write an article detailing your approach! > > Gene
I'll work on writing one up over the next few days, as well as posting a sample implementation. Mike
Reply by Brian Davis September 7, 20182018-09-07
earlier, I wrote:
> > If perchance this is related to your recent CORDIC rotator code, > I've seen a number of CORDIC optimization schemes over the years > to reduce the number of rotation stages, IIRC typically either > by a 'jump start' or merging/optimizing rotation stages. >
oops, for some reason, when first reading this thread I didn't see the later posts with the explanation... I'd swear they weren't there, but maybe I was just scroll-impaired. -Brian