Reply by Ray Andraka December 9, 20042004-12-09
Moti,

Another trick for NCO's that will double the speed, assuming you are careful
about how you update the tuning word (phase increment) is to use two
accumulators running clock enabled on every other clock.  One is initialized
with 0, the other is initialized with the phase increment value, and then both
are incremented by 2x the phase increment on every other clock.  This allows
two clock periods for the accumulator carry to propagate, and you get the
current and next phase out at the same time on every other clock.  A 2:1 mux
switching at the clock rate selects the output from the two accumulators on
alternate clocks.

I've used this trick also for cases where the mixer it is driving can't run at
the full clock rate (I usually use a CORDIC rotator there, see my XCELL
article about that).  In that case, you use duplicate copies of the mixer, one
for even samples and the other for odd samples.  The phase for both is
incremented by 2x the sample phase increment, with one offset by 1x the phase
increment.

I don't think you'll have to resort to this to get to 200MHz with a
Spartan3,however.  I'm pretty sure careful floorplanning combined with making
sure you have just one level of LUT logic in the critical path (new addend
through carry chain to accumulator register) will get you 200MHz with a very
comfortable margin at 32 bits.
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759


Reply by Ray Andraka December 9, 20042004-12-09
Moti,

There are a couple things you can do.  First off, if you look closely at
an accumulator, the feedback from a particular bit only affects that bit
and bits with greater significance.  That suggests that you can perform
partial sums and then combine the results.  One simple trick that takes 2x
the resources of the straight accumulator is to break your 32 bit
accumulator into two 16 bit accumulators.  The carry out of the first gets
registered and fed into the carry in of the second.  Note that by doing
that, the second follows the first by a clock cycle, so you need to delay
the upper half of the addend (the new value getting added, not the
feedback value) by a clock cycle using a register so that it arrives at
the upper half of the accumulator at the same time as the registered carry
from the lower half.  Likewise, the lower half sum output (but not the
feedback) has to be delayed by a clock cycle to align it with the upper
half sum.  On the surface, that would seem to permit almost double the
clock speed (and it did in older Xilinx devices), however in Virtex
devices the propagation time to get on and off the carry chain is an order
of magnitude larger than the bit to bit propagation times, so in reality
the gain from this trick is rather small until you get into truely huge
accumulator widths.

A more usable trick requires a little more attention to the design
implementation.  The carry chains are typically the critical path (mostly
because the times to get on and off the chain are on par with the LUT
delay).  You can't do much anything about the delay in the carry chain or
the intrinsic delay for getting on and off the chain.  You can, however,
minimize the delays on the signal connecting to the carry chain input.
This means making sure that you only have one level of logic (ie,
flip-flops are directly driving the LUTs that feed the carry chain), and
you need to make sure those flip-flops are placed in close proximity to
the carry chain (ideally either in the same CLB, or on an adjacent CLB so
you can use the direct connect wires).  Note that the automatic placement
is not particularly good at making sure those flip-flops are placed this
way.  The accumulator feedback doesn't need to be pipelined because it is
connecting back around to the same bit (assuming you've reduced the logic
to 1 level), which means it is already pipelined as much as it could be.
You may need to pipeline the new addend path in order to achieve the one
level of logic at the accumulator and keep the driving flip-flops in
adjacent slices, but that is OK as it doesn't affect the accumulator
operation.

You normally should use active high resets because that is what is native
to the fpga.  In this case, I don't think it is affecting your timing
however, because it is an asynchronous reset.  Had it been a syncrhonous
reset, some synthesizers would have inserted a gate between the carry
chain and the register, which would have added an extra LUT delay to the
input path rather than inverting the resetn signal.

Hope this helps


Moti Cohen wrote:

> Hello all, > I've a design that contains a NCO (Numerically controlled oscillator). > The NCO consists of a 32'bit accumulator. when i write the accumulator > straight forward like this - > > process (clk,resetn) > begin > if resetn = '0' then > accumulator <= (others =>'0'); > elsif clk'event and clk ='1' then > accumulator <= accumulator + inc_value; > end if; > end process; > Fout <= accumulator (accumulator'high); > > the maximum frequency I can achive for 'clk' is ~ 150 MHz (spartan 3). > I need it to work in ~200 MHz so I figured out that some pipelining is > needed but I dont know how to do it because of the accumulator > feedback. Maybe someone here can explain it to me or even give me a > code example (which will be great). > > Thanks in advance, Moti.
-- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759
Reply by rickman December 9, 20042004-12-09
Allan Herriman wrote:
> > On Wed, 08 Dec 2004 18:04:31 -0500, rickman <spamgoeshere4@yahoo.com> > wrote: > > I must be a 'no one'.
Well, I wouldn't go *that* far.. :)
> Rick, we have discussed this before, e.g. in this thread: > http://groups-beta.google.com/group/comp.arch.embedded/browse_frm/thread/7e0ec68b5c53e4
You have a *much* better memory than I do. I think I had looked into this, but my idea was rejected by higher ups in favor of a speciallized chip that actually used the top N bits of the accumulator to drive an ADC. This sine wave was then filtered and fed back to the chip for clipping via a comparator.
> This is something I've done in real designs. I've also developed > tools for estimating the output jitter of the NCO, taking the loop > bandwidth (and order) of the PLL into account. > It is possible to achieve very low levels of jitter at the PLL output, > if the frequencies are carefully chosen such that the higher level > spurious signals at the output of the NCO are well outside the PLL > loop bandwidth.
I looked at the posts that you refer to. That post has some defunct links for other posts or web pages. Heck, a couple of them are to altavista that doesn't even refer you to whoever bought them. Things change fast on the internet.
> >I always figured that the low pass filter would do the smoothing for > >me. > > Exactly. Although this does require the phase detector to be linear > (otherwise the jitter signals will be demodulated). Common phase > detector types (e.g. most digital phase detectors driving charge > pumps) aren't particularly linear due to inexact balance between the > pull-up and pull-down current sources. A figure of 10% is sometimes > quoted.
What phase detectors *are* linear? -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
Reply by Allan Herriman December 8, 20042004-12-08
On Wed, 08 Dec 2004 18:04:31 -0500, rickman <spamgoeshere4@yahoo.com>
wrote:

>John_H wrote: >> >> Noise shaping is the right way to go for a superb quality synthesizer, but >> the correction phase error - the output from the noise shaper - needs to be >> applied based on the synchronous edge position relative to the "ideal" edge >> position - the input to the noise shaper. (Pseudo)Random doesn't do it. >> >> All this assumes, of course, that there's an analog PLL driven by the single >> bit, noise-shaped NCO output. Without the PLL to filter out the high >> frequency phase noise of a Sigma-Delta style NCO, the jitter is still around >> 1 reference clock period peak-to-peak, maybe worse. > >That answers a question I have had for a long time. It occured to me a >long time ago to use an analog PLL to smooth out the ragged edges in an >NCO clock. But no one I spoke to about it could say if it would work.
I must be a 'no one'. Rick, we have discussed this before, e.g. in this thread: http://groups-beta.google.com/group/comp.arch.embedded/browse_frm/thread/7e0ec68b5c53e4 This is something I've done in real designs. I've also developed tools for estimating the output jitter of the NCO, taking the loop bandwidth (and order) of the PLL into account. It is possible to achieve very low levels of jitter at the PLL output, if the frequencies are carefully chosen such that the higher level spurious signals at the output of the NCO are well outside the PLL loop bandwidth.
>I always figured that the low pass filter would do the smoothing for >me.
Exactly. Although this does require the phase detector to be linear (otherwise the jitter signals will be demodulated). Common phase detector types (e.g. most digital phase detectors driving charge pumps) aren't particularly linear due to inexact balance between the pull-up and pull-down current sources. A figure of 10% is sometimes quoted. Regards, Allan
Reply by rickman December 8, 20042004-12-08
John_H wrote:
> > Noise shaping is the right way to go for a superb quality synthesizer, but > the correction phase error - the output from the noise shaper - needs to be > applied based on the synchronous edge position relative to the "ideal" edge > position - the input to the noise shaper. (Pseudo)Random doesn't do it. > > All this assumes, of course, that there's an analog PLL driven by the single > bit, noise-shaped NCO output. Without the PLL to filter out the high > frequency phase noise of a Sigma-Delta style NCO, the jitter is still around > 1 reference clock period peak-to-peak, maybe worse.
That answers a question I have had for a long time. It occured to me a long time ago to use an analog PLL to smooth out the ragged edges in an NCO clock. But no one I spoke to about it could say if it would work. I always figured that the low pass filter would do the smoothing for me. I should never have doubted myself. ;) -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
Reply by Falk Brunner December 8, 20042004-12-08
"John_H" <johnhandwork@mail.com> schrieb im Newsbeitrag
news:S9std.14$a61.1075@news-west.eli.net...

> All this assumes, of course, that there's an analog PLL driven by the
single
> bit, noise-shaped NCO output. Without the PLL to filter out the high > frequency phase noise of a Sigma-Delta style NCO, the jitter is still
around
> 1 reference clock period peak-to-peak, maybe worse.
Yes.
> (NCOs are used by many folks in the comp.arch.fpga newsgroup who have no > reason to visit comp.dsp.)
???? Dont get it. Regards Falk
Reply by John_H December 7, 20042004-12-07
"Falk Brunner" <Falk.Brunner@gmx.de> wrote in message
news:31mt3nF3d9asiU1@individual.net...
> > "John_H" <johnhandwork@mail.com> schrieb im Newsbeitrag > news:yB5td.10$a61.633@news-west.eli.net... > > > The difference between 906,238,099/2^32 and 906,238,099.456/2^32 is > about > > 5.03e-10 at which point small amounts of jitter are lost. If the jitter > at > > that tiny offset is large, you will experience phase jumps when that
beat
> > frequency is felt. There's no way to filter those with analog filters. > > I guess the trick is noise shaping. Adding a (pseudo)random phase error to > distribute the jitter energy over a wider band and also move it to higher > frequencies. Sigma-Delta Style. > > Regards > Falk
Noise shaping is the right way to go for a superb quality synthesizer, but the correction phase error - the output from the noise shaper - needs to be applied based on the synchronous edge position relative to the "ideal" edge position - the input to the noise shaper. (Pseudo)Random doesn't do it. All this assumes, of course, that there's an analog PLL driven by the single bit, noise-shaped NCO output. Without the PLL to filter out the high frequency phase noise of a Sigma-Delta style NCO, the jitter is still around 1 reference clock period peak-to-peak, maybe worse. (NCOs are used by many folks in the comp.arch.fpga newsgroup who have no reason to visit comp.dsp.)
Reply by Falk Brunner December 7, 20042004-12-07
"John_H" <johnhandwork@mail.com> schrieb im Newsbeitrag
news:yB5td.10$a61.633@news-west.eli.net...

> The difference between 906,238,099/2^32 and 906,238,099.456/2^32 is
about
> 5.03e-10 at which point small amounts of jitter are lost. If the jitter
at
> that tiny offset is large, you will experience phase jumps when that beat > frequency is felt. There's no way to filter those with analog filters.
I guess the trick is noise shaping. Adding a (pseudo)random phase error to distribute the jitter energy over a wider band and also move it to higher frequencies. Sigma-Delta Style. Regards Falk
Reply by Hal Murray December 7, 20042004-12-07
>>>every 79 cycles, to see => 14 @ 250KHz, 65@ 200KHz => 210.76923Khz,
> It also seems strange to not see 200KHz, 250KHz... ?
We started with a 1 MHz clock. Right. The above recipe repeats after 14*5 + 65*4 cycles. That's a total of 381 uSec, or 2.624671 KHz. How do I get 200 KHz or 250 KHz from that? What harmonic? 200 / 2.624671 => 76.200026 250 / 2.624671 => 95.250033 Those aren't close enough to integers for rounding to explain the differences. (I might have fatfingered something.) -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.
Reply by rickman December 7, 20042004-12-07
Moti wrote:
> > Hi Falk, > My german is pretty "rusty" :) so if the document is in .pdf format it > will very hard... > but if it's in a html format it can translated by google and then it > will be possible to read it! > Regards, Moti.
You should be able to copy and paste the text from a PDF into a web page for translation. But my experience has been that web page translations give you English that is not much easier to understand than the language you are translating from. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX