FPGARelated.com
Forums

how to speed up my accumulator ??

Started by Moti Cohen December 5, 2004
Allan Herriman wrote:
> On Tue, 07 Dec 2004 15:34:36 +1300, Jim Granville > <no.spam@designtools.co.nz> wrote: > > >>John_H wrote: >> >>>It's involved... >>>With your example of 211kHz from a 1MHz reference, the ratio of >>>906,238,099/2^32 has closest fractions in order of worst to best of >>> >>>1/5 >>>4/19 >>>23/109 >>>211/1000 >>>1987386/9418891 >>>19873649/94187910 >>>57633561/273144839 >>> >>>The offsets are the ideal frequency compared to the ratio frequency: >>>211 kHz - 200 kHz >>>211 kHz - 210.52632 kHz >>>211 kHz - 211.00917 kHz >>>211 kHz - 211 kHz... Here Excel starts to lose digits: >>> >>> The difference between 906,238,099/2^32 and 906,238,099.456/2^32 is about >>>5.03e-10 at which point small amounts of jitter are lost. If the jitter at >>>that tiny offset is large, you will experience phase jumps when that beat >>>frequency is felt. There's no way to filter those with analog filters. >>> >>>Your largest observed peaks in the spectrum will be at offsets of 11 kHz, >>>526 Hz, and 9.17 Hz. You should be able to see the 526 Hz modulating the >>>11kHz for spikes much smaller than the 11 kHz peak. >> >>Good example maths, but is the principle right ? >> >> >>For the example of 211KHz from 1Mhz, you have 1us quantize, and so will >>be able to generate 4us, or 5us periods, giving 250KHz and 200KHz. >> >>Over many cycles, the 'wobbling' between these two will average to >>211KHz. The more cycles, the better the match to 211KHz. >> >>Over a 6 cycle snapshot, you might see 5@200, 1@250, and Favge 208.33Khz >>That's appx one part in 77 too slow. >>This 6 cycle frame has a freq of 34.6KHz >> >>Next frame group would be (eg) >>every 79 cycles, to see => 14 @ 250KHz, 65@ 200KHz => 210.76923Khz, >>Error is now one part in 1000, and this finer frame is 2.65KHz >>( etc ) as over wider frame snap-shots, the average frequency gets >>closer to the 211KHz ideal. >> >>So I'd expect to see, on a spectrum analyser, 200KHz, (Dominant) 250KHz >>and 34.6KHz and 2.65KHz (etc) energies. > > > Did you actually plug it in to a spectrum analyser and see those > tones?
No, it was just 'back of an envelope' stuff, to get a feel for what repetition frames are likely, and so what the likely energies are.
> > Ten highest spurious tones: > > 55.000kHz -11.4dBc > 367.000kHz -16.7dBc > 101.000kHz -17.0dBc > 165.000kHz -22.4dBc > 9.000kHz -22.9dBc > 257.000kHz -24.1dBc > 321.000kHz -25.1dBc > 147.000kHz -25.8dBc > 119.000kHz -27.4dBc > 37.000kHz -27.7dBc
Are these rounded to the nearest KHz, as I can't derive 55.00KHz either... a 19 cycle @ 1MHz frame, would be 52.63KHz ? It also seems strange to not see 200KHz, 250KHz... ? -jg
Moti wrote:
> > Hi Falk, > My german is pretty "rusty" :) so if the document is in .pdf format it > will very hard... > but if it's in a html format it can translated by google and then it > will be possible to read it! > Regards, Moti.
You should be able to copy and paste the text from a PDF into a web page for translation. But my experience has been that web page translations give you English that is not much easier to understand than the language you are translating from. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
>>>every 79 cycles, to see => 14 @ 250KHz, 65@ 200KHz => 210.76923Khz,
> It also seems strange to not see 200KHz, 250KHz... ?
We started with a 1 MHz clock. Right. The above recipe repeats after 14*5 + 65*4 cycles. That's a total of 381 uSec, or 2.624671 KHz. How do I get 200 KHz or 250 KHz from that? What harmonic? 200 / 2.624671 => 76.200026 250 / 2.624671 => 95.250033 Those aren't close enough to integers for rounding to explain the differences. (I might have fatfingered something.) -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.
"John_H" <johnhandwork@mail.com> schrieb im Newsbeitrag
news:yB5td.10$a61.633@news-west.eli.net...

> The difference between 906,238,099/2^32 and 906,238,099.456/2^32 is
about
> 5.03e-10 at which point small amounts of jitter are lost. If the jitter
at
> that tiny offset is large, you will experience phase jumps when that beat > frequency is felt. There's no way to filter those with analog filters.
I guess the trick is noise shaping. Adding a (pseudo)random phase error to distribute the jitter energy over a wider band and also move it to higher frequencies. Sigma-Delta Style. Regards Falk
"Falk Brunner" <Falk.Brunner@gmx.de> wrote in message
news:31mt3nF3d9asiU1@individual.net...
> > "John_H" <johnhandwork@mail.com> schrieb im Newsbeitrag > news:yB5td.10$a61.633@news-west.eli.net... > > > The difference between 906,238,099/2^32 and 906,238,099.456/2^32 is > about > > 5.03e-10 at which point small amounts of jitter are lost. If the jitter > at > > that tiny offset is large, you will experience phase jumps when that
beat
> > frequency is felt. There's no way to filter those with analog filters. > > I guess the trick is noise shaping. Adding a (pseudo)random phase error to > distribute the jitter energy over a wider band and also move it to higher > frequencies. Sigma-Delta Style. > > Regards > Falk
Noise shaping is the right way to go for a superb quality synthesizer, but the correction phase error - the output from the noise shaper - needs to be applied based on the synchronous edge position relative to the "ideal" edge position - the input to the noise shaper. (Pseudo)Random doesn't do it. All this assumes, of course, that there's an analog PLL driven by the single bit, noise-shaped NCO output. Without the PLL to filter out the high frequency phase noise of a Sigma-Delta style NCO, the jitter is still around 1 reference clock period peak-to-peak, maybe worse. (NCOs are used by many folks in the comp.arch.fpga newsgroup who have no reason to visit comp.dsp.)
"John_H" <johnhandwork@mail.com> schrieb im Newsbeitrag
news:S9std.14$a61.1075@news-west.eli.net...

> All this assumes, of course, that there's an analog PLL driven by the
single
> bit, noise-shaped NCO output. Without the PLL to filter out the high > frequency phase noise of a Sigma-Delta style NCO, the jitter is still
around
> 1 reference clock period peak-to-peak, maybe worse.
Yes.
> (NCOs are used by many folks in the comp.arch.fpga newsgroup who have no > reason to visit comp.dsp.)
???? Dont get it. Regards Falk
John_H wrote:
> > Noise shaping is the right way to go for a superb quality synthesizer, but > the correction phase error - the output from the noise shaper - needs to be > applied based on the synchronous edge position relative to the "ideal" edge > position - the input to the noise shaper. (Pseudo)Random doesn't do it. > > All this assumes, of course, that there's an analog PLL driven by the single > bit, noise-shaped NCO output. Without the PLL to filter out the high > frequency phase noise of a Sigma-Delta style NCO, the jitter is still around > 1 reference clock period peak-to-peak, maybe worse.
That answers a question I have had for a long time. It occured to me a long time ago to use an analog PLL to smooth out the ragged edges in an NCO clock. But no one I spoke to about it could say if it would work. I always figured that the low pass filter would do the smoothing for me. I should never have doubted myself. ;) -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
On Wed, 08 Dec 2004 18:04:31 -0500, rickman <spamgoeshere4@yahoo.com>
wrote:

>John_H wrote: >> >> Noise shaping is the right way to go for a superb quality synthesizer, but >> the correction phase error - the output from the noise shaper - needs to be >> applied based on the synchronous edge position relative to the "ideal" edge >> position - the input to the noise shaper. (Pseudo)Random doesn't do it. >> >> All this assumes, of course, that there's an analog PLL driven by the single >> bit, noise-shaped NCO output. Without the PLL to filter out the high >> frequency phase noise of a Sigma-Delta style NCO, the jitter is still around >> 1 reference clock period peak-to-peak, maybe worse. > >That answers a question I have had for a long time. It occured to me a >long time ago to use an analog PLL to smooth out the ragged edges in an >NCO clock. But no one I spoke to about it could say if it would work.
I must be a 'no one'. Rick, we have discussed this before, e.g. in this thread: http://groups-beta.google.com/group/comp.arch.embedded/browse_frm/thread/7e0ec68b5c53e4 This is something I've done in real designs. I've also developed tools for estimating the output jitter of the NCO, taking the loop bandwidth (and order) of the PLL into account. It is possible to achieve very low levels of jitter at the PLL output, if the frequencies are carefully chosen such that the higher level spurious signals at the output of the NCO are well outside the PLL loop bandwidth.
>I always figured that the low pass filter would do the smoothing for >me.
Exactly. Although this does require the phase detector to be linear (otherwise the jitter signals will be demodulated). Common phase detector types (e.g. most digital phase detectors driving charge pumps) aren't particularly linear due to inexact balance between the pull-up and pull-down current sources. A figure of 10% is sometimes quoted. Regards, Allan
Allan Herriman wrote:
> > On Wed, 08 Dec 2004 18:04:31 -0500, rickman <spamgoeshere4@yahoo.com> > wrote: > > I must be a 'no one'.
Well, I wouldn't go *that* far.. :)
> Rick, we have discussed this before, e.g. in this thread: > http://groups-beta.google.com/group/comp.arch.embedded/browse_frm/thread/7e0ec68b5c53e4
You have a *much* better memory than I do. I think I had looked into this, but my idea was rejected by higher ups in favor of a speciallized chip that actually used the top N bits of the accumulator to drive an ADC. This sine wave was then filtered and fed back to the chip for clipping via a comparator.
> This is something I've done in real designs. I've also developed > tools for estimating the output jitter of the NCO, taking the loop > bandwidth (and order) of the PLL into account. > It is possible to achieve very low levels of jitter at the PLL output, > if the frequencies are carefully chosen such that the higher level > spurious signals at the output of the NCO are well outside the PLL > loop bandwidth.
I looked at the posts that you refer to. That post has some defunct links for other posts or web pages. Heck, a couple of them are to altavista that doesn't even refer you to whoever bought them. Things change fast on the internet.
> >I always figured that the low pass filter would do the smoothing for > >me. > > Exactly. Although this does require the phase detector to be linear > (otherwise the jitter signals will be demodulated). Common phase > detector types (e.g. most digital phase detectors driving charge > pumps) aren't particularly linear due to inexact balance between the > pull-up and pull-down current sources. A figure of 10% is sometimes > quoted.
What phase detectors *are* linear? -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
Moti,

There are a couple things you can do.  First off, if you look closely at
an accumulator, the feedback from a particular bit only affects that bit
and bits with greater significance.  That suggests that you can perform
partial sums and then combine the results.  One simple trick that takes 2x
the resources of the straight accumulator is to break your 32 bit
accumulator into two 16 bit accumulators.  The carry out of the first gets
registered and fed into the carry in of the second.  Note that by doing
that, the second follows the first by a clock cycle, so you need to delay
the upper half of the addend (the new value getting added, not the
feedback value) by a clock cycle using a register so that it arrives at
the upper half of the accumulator at the same time as the registered carry
from the lower half.  Likewise, the lower half sum output (but not the
feedback) has to be delayed by a clock cycle to align it with the upper
half sum.  On the surface, that would seem to permit almost double the
clock speed (and it did in older Xilinx devices), however in Virtex
devices the propagation time to get on and off the carry chain is an order
of magnitude larger than the bit to bit propagation times, so in reality
the gain from this trick is rather small until you get into truely huge
accumulator widths.

A more usable trick requires a little more attention to the design
implementation.  The carry chains are typically the critical path (mostly
because the times to get on and off the chain are on par with the LUT
delay).  You can't do much anything about the delay in the carry chain or
the intrinsic delay for getting on and off the chain.  You can, however,
minimize the delays on the signal connecting to the carry chain input.
This means making sure that you only have one level of logic (ie,
flip-flops are directly driving the LUTs that feed the carry chain), and
you need to make sure those flip-flops are placed in close proximity to
the carry chain (ideally either in the same CLB, or on an adjacent CLB so
you can use the direct connect wires).  Note that the automatic placement
is not particularly good at making sure those flip-flops are placed this
way.  The accumulator feedback doesn't need to be pipelined because it is
connecting back around to the same bit (assuming you've reduced the logic
to 1 level), which means it is already pipelined as much as it could be.
You may need to pipeline the new addend path in order to achieve the one
level of logic at the accumulator and keep the driving flip-flops in
adjacent slices, but that is OK as it doesn't affect the accumulator
operation.

You normally should use active high resets because that is what is native
to the fpga.  In this case, I don't think it is affecting your timing
however, because it is an asynchronous reset.  Had it been a syncrhonous
reset, some synthesizers would have inserted a gate between the carry
chain and the register, which would have added an extra LUT delay to the
input path rather than inverting the resetn signal.

Hope this helps


Moti Cohen wrote:

> Hello all, > I've a design that contains a NCO (Numerically controlled oscillator). > The NCO consists of a 32'bit accumulator. when i write the accumulator > straight forward like this - > > process (clk,resetn) > begin > if resetn = '0' then > accumulator <= (others =>'0'); > elsif clk'event and clk ='1' then > accumulator <= accumulator + inc_value; > end if; > end process; > Fout <= accumulator (accumulator'high); > > the maximum frequency I can achive for 'clk' is ~ 150 MHz (spartan 3). > I need it to work in ~200 MHz so I figured out that some pipelining is > needed but I dont know how to do it because of the accumulator > feedback. Maybe someone here can explain it to me or even give me a > code example (which will be great). > > Thanks in advance, Moti.
-- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759