Gurus, I have built and tested a numerically-controlled oscillator (clock generator) using a simple phase accumulator (adder) and two registers. One register contains the tuning word (N), and the other is used in the feedback loop into the second input of the adder. I take the MSB of the feedback register as my synthesised clock. I am generating sub 50kHz clock frequencies, by clocking the feedback register at 100 MHz. The accumulator is a 32 bit adder as is the feedback register (of course). Works nicely on a board (my tuning word comes from a processor chip, and my spectrum analyzer tells the truth when I look at my MSB generated clock). To reduce the jitter I would like to run two or more phase accumulators in parallel which are clock-enabled on every-other clock cycle (as per Ray Andraka's suggestion from the "how to speed up my accumulator" post by Moti in Dec 2004) and then switch between the MSBs of each accumulator using a MUX on the MSBs. The problem then comes down to how fast I can switch the MUX - the faster the better. 1. Is the Xilinx CoreGen 1-bit MUX a good option? 2. For a 4-input 1-output MUX I would need a 2 bit counter counting the select word in sequence 00, 01, 10, 11, 00 .... - how fast could this be done? 3. What about using a fast parallel-to-serial converter approach ? (feeding the outputs of each NCO into a shift register and then blasting out the bits really fast to a pin - effectively doing a round-robin type switching between the MSB of each NCo). I have designed (but not yet implemented) this scheme, and I would like some advice relating on how best to best do this. I look forward to everyone's replies! Cheers, PeterC.
Parallel NCO (DDS) in Spartan3 for clock synthesis - highest possible speed?
Started by ●February 8, 2006
Reply by ●February 9, 20062006-02-09
Peter, You need to get the MSB out of the FPGA, right? Look at using the double data rate FFs in the IOBs. I think the FF is called FDDRCPE in the libraries guide. This will let you get data out of the FPGA at twice your clock rate, one bit on the riding edge, one on the falling edge. This is probably the fastest, certainly the easiest and most reliable way to mux data out of the part. Also, use a DCM for your clock to make sure you have a 50% duty factor on the clock. Check out XAPP265. That guy gets 840 Mbps out of an LVDS output. HTH and good luck, Syms.
Reply by ●February 9, 20062006-02-09
"PeterC" <peter@geckoaudio.com> schrieb im Newsbeitrag news:1139454506.414529.307760@g47g2000cwa.googlegroups.com...> > Gurus, > > I have built and tested a numerically-controlled oscillator (clock > generator) using a simple phase accumulator (adder) and two registers. > One register contains the tuning word (N), and the other is used in > the > feedback loop into the second input of the adder. > > I take the MSB of the feedback register as my synthesised clock. I am > generating sub 50kHz clock frequencies, by clocking the feedback > register at 100 MHz. The accumulator is a 32 bit adder as is the > feedback register (of course). Works nicely on a board (my tuning word > comes from a processor chip, and my spectrum analyzer tells the truth > when I look at my MSB generated clock). > > To reduce the jitter I would like to run two or more phase > accumulators > in parallel which are clock-enabled on every-other clock cycle (as per > Ray Andraka's suggestion from the "how to speed up my accumulator" > post > by Moti in Dec 2004) and then switch between the MSBs of each > accumulator using a MUX on the MSBs.What about going analog ? This means: Build a R2R DAC with 2 CAT16 Respacks from 4 output's and then anti alias filter it. I use this approach to generate a high spectral purity 27MHz +-1% clock with a 48Bit DDS running at 100MHz. I generate a 5 Bit sine value out of a 16 entry ROM, dither this to 4 Bit at 200MHz (with help of the DDR IOB FF's). I connect the output node of the R2R DAC to a LC Parallel Resonant ciruit (the Filter) at 27MHz. This filter has the additional advantage to filter out more than the aliasing frequencies (also the quantization noise away from the 27MHz filter center) than a multiple order low-pass _and_ being much simpler and cheaper. This is then fed to the "receiving gate", a LVDS transmitter in my case making the analog sine wave a good digital signal. The spectral purity can get quite high. And there is still room for additional improvement. Raymund Hofmann
Reply by ●February 9, 20062006-02-09
Symon wrote:> Peter, > You need to get the MSB out of the FPGA, right? Look at using the double > data rate FFs in the IOBs. I think the FF is called FDDRCPE in the libraries > guide. This will let you get data out of the FPGA at twice your clock rate, > one bit on the riding edge, one on the falling edge. This is probably the > fastest, certainly the easiest and most reliable way to mux data out of the > part. Also, use a DCM for your clock to make sure you have a 50% duty factor > on the clock. > Check out XAPP265. That guy gets 840 Mbps out of an LVDS output. > HTH and good luck, Syms.The DCM for 50% duty cycle correction is great. I'd add two things: 1) the phase accumulator can be staged so you have 4 8-bit adders instead of 1 32-bit adder allowing higher accumulator speeds, and 2) don't implement the full phase accumulator for the multiple NCO copies; use one phase accumulator but add different phase values (N/4, N/2, 3N/4, N) for different MSBs. This way your accumulators will never be mis-syncronized. If your frequency range is always tight (e.g., 25-50kHz) you can even reduce the resolution of the non-accumulating adders (N/4, N/2, 3N/4). For real *fun* you can use bit-serial arithmatic do to a 32-bit NCO then do a bit-serial divider to figure out what fraction of N the accumulator had when (and only when) it rolled over. While this isn't your typical 30-minute design session, it can be a great learning experience! I designed a bit-serial NCO a while back and know how to do nice pipelined dividers but haven't yet implemented those as bit-serial elements. Since your 50kHz or lower speed gives 2k cycles at 100MHz (or 8k cycles at 400MHz) you could use the technique to give you the maximum achievable DDR output rate the chip can support. Bit-serial is really amazing in this respect. In any case, the speed of the MUX you choose shouldn't be the limiting factor in your design. With the DDR IO register and pipelining, the MUX functionality can be 1 LUT of logic between registers at the maximum chip speed.
Reply by ●February 9, 20062006-02-09
PeterC wrote:> Gurus, > > I have built and tested a numerically-controlled oscillator (clock > generator) using a simple phase accumulator (adder) and two registers. > One register contains the tuning word (N), and the other is used in the > feedback loop into the second input of the adder. > > I take the MSB of the feedback register as my synthesised clock. I am > generating sub 50kHz clock frequencies, by clocking the feedback > register at 100 MHz. The accumulator is a 32 bit adder as is the > feedback register (of course). Works nicely on a board (my tuning word > comes from a processor chip, and my spectrum analyzer tells the truth > when I look at my MSB generated clock). > > To reduce the jitter I would like to run two or more phase accumulators > in parallel which are clock-enabled on every-other clock cycle (as per > Ray Andraka's suggestion from the "how to speed up my accumulator" post > by Moti in Dec 2004) and then switch between the MSBs of each > accumulator using a MUX on the MSBs.At your Sub 50KHz, what frequency step can you tolerate ? You can trade off average precision for purity. DDS gives a numerical frequency, whose average has many digits.. but as you have found, it has a lot of phase jitter. The alternative is a simple divide by N, ( for 100Mhz - 50KHz, N=2000, so your next freq step (/2001) is just under 25Hz away. For audio, that's probably tolerable ? (You can think of the DDS as dithering between these two values) At 1KHz, steps are much smaller. More complex, is to use a DPLL, and create Fo = M/N, and you scale both M and N. You will pick up the DPLL jitter as well, but that's usually much smaller than system clk times. -jg
Reply by ●February 9, 20062006-02-09
Thank you for your detailed system description Raymond - unfortunately cost is critical, and I simply don't have the option of using any external components - hence the desire to synthesize useable audio clocks completely in the FPGA, ideally from a cheap crystal (or the crystal already used by the processor chip, as I'm doing now).. PeterC.
Reply by ●February 9, 20062006-02-09
Symon, Yes, I need the MSB out of the FPGA, to drive an audio DAC. It's value only really changes at 50kHz or so, but to reduce the jitter associated with this low frequency transition, the clock that drives it out needs to be as fast as possible (obviously). 840 Mbps would give 1.2 ns of jitter which would be more than good enough. The problem is that the same NCO must generate an (approx) 12 MHz and 24 MHz signal - a few ns jitter on these is unacceptable. I will look at the FDDRCPE in the IOBs - great hint and much appreciated. I'm considering introducing 4 bits of dither, using a four 30-bit LFSR (linear feedback shift registers) which would give a nice and long (in terms of repeat cycles) pseudo-random 4-bit word sequence, to spread out my side-bands (I can live with the raised noise floor). Cheers, Peter C.
Reply by ●February 9, 20062006-02-09
John - Pipelining the accumulators I will certainly look at and this should be simple, since they have simple ripple-carry carry chains, will try 8 then 4-bit granularity if needed. On your point (2), I'm not sure I understand completely - this would require MUXing both inputs of a single adder - both the feedback and the input tuning words, adding an additional MUX delay? Yes, my tuning range spans about 10kHz around the 50 kHz point, and I would like to do this with single Hz resolution. If you can send a quick sketch to peter (at) geckoaudio (dot) com that would be great. By "reducing the resolution of the non-accumulating adders" I take it to mean that since N/4 etc will be a relatively small number, it certainly would not need to sit in a 32-bit register? The bit serial approach is interesting, but I think the internal fabric clock limit is around 300 MHz anyway, and an 8-bit or 4-bit pipelined adder would probably run at close to this anyway (I'm guessing here)? On the topic of *fun* - how does knowing the ratio of the contents of the accumulator to the tuning word (N) after it turns over? Excuse my ignorance, but I don't see how this is useful. Cheers, PeterC.
Reply by ●February 9, 20062006-02-09
Jim, I can tolerate a 1 Hz step (I need real-time tuning with at least this resolution, as well as a small number of "coarse" steps of about 5kHz). Apologies for not posting this initially to eliminate this as a candidate, I have thought about the simple integer division - but my range and tuning require DDS. As much as I'd like to, I can't use a PLL due to cost! Cheers, Peter.
Reply by ●February 9, 20062006-02-09
PeterC wrote:> Jim, > > I can tolerate a 1 Hz step (I need real-time tuning with at least this > resolution, as well as a small number of "coarse" steps of about 5kHz). > Apologies for not posting this initially to eliminate this as a > candidate, I have thought about the simple integer division - but my > range and tuning require DDS. As much as I'd like to, I can't use a PLL > due to cost!The DPLL I meant, was the Clock module inside the FPGA, not an external one. A simple divider, from ~200Mhz, gives better than 1Hz dF, below 14KHz Fo. Could that be good enough ? [It will have vey low jitter] -jg