comp.arch.fpga | Is this phase accumulator trick well-known???| page 2

Reply by Glen Herrmannsfeldt ●February 8, 20092009-02-08

Jonathan Bromley wrote:

> hi comp.arch.fpga,
> (accidentally posted to comp.lang.vhdl 
> a few moments ago- sorry)

> The question - repeated after the explanation - 
> is: here's what I think is a nifty trick; has
> anyone seen it, or been aware of it, before?
> I can't believe it's really new.

As far as I know, the usual system is to use a power of
two modulus and supply enough bits to get the required
accuracy.  It is then easy to take the top bit as
a (close enough to) 50% duty cycle clock.

> I have been messing around with baud rate generators
> and suchlike - creating a pulse that's active for
> one clock period at some required repetition rate -
> and wanted to try a phase accumulator technique 
> instead of a simple divider.  That makes it far 
> easier to specify the frequency - it's simply the
> phase-delta input value - and easily allows for
> non-integral divide ratios, at the cost of one
> master clock period of jitter.

> The phase-accumulator produces pulses with a
> repetition rate of 
>   Fc * M / N 
> where Fc is the master clock, M is the phase delta
> and N is the counter's modulus.  However, to get
> the huge convenience of specifying M as the required
> frequency, I must make N be equal to the frequency
> of Fc, and this is unlikely to be an exact power of 2.
> So the phase accumulator works like this:

>   on every clock pulse...
>     if (acc < 0) then
>       add := acc + N;
>       output_pulse <= '1';
>     else
>       output_pulse <= '0';
>     end if;
>     acc := acc - M;  -- unconditionally

> This is fine, but it means that on the "wrap-around"
> clock cycle I must add either N-M to the accumulator;
> if either M or N are variable, that costs me another
> adder.

Adders are pretty cheap in FPGAs.  It would take some
work to see if your method is a net savings.

> Today I came up with an intriguing (to me) alternative:
> on the wrap-around cycle, add N to the accumulator;
> on the immediately subsequent cycle, add (-2M); on
> all other cycles, add (-M).  This is of course rather
> easy to do since 2M is just a left shift.  A few 
> trial synthesis runs convinced me that it will give
> measurably better performance than the two-adder
> version.  VHDL code is appended for anyone who wants
> to play.

It seems to me that needs an additional MUX to
save an adder.  In 4LUT logic, I believe that
is pretty close to the same in CLBs.

> My question is: has this trick been published anywhere?
> Or is it something that "those skilled in the art"
> already know about?  I haven't seen it before, but that 
> simply means I probably haven't looked hard enough.

There are a large number of tricks that have been used
over the years to make computer arithmetic faster.  Many of
those tricks aren't as effective in FPGA logic, so new
ones will have to be developed.

-- glen

Reply by nospam ●February 8, 20092009-02-08

"kadhiem_ayob" <kadhiem_ayob@yahoo.co.uk> wrote:

>Hi Jonathan,
>
>I can't quite get your idea (You may have mistyped M,N)
>
>This is what I do for phase accummulator(modulus not power of 2), 
>e.g. modulus = 1600, increment = 869
>------------------------------ 
>if(modulo_adder < 1600)then
>     modulo_adder <= modulo_adder + 869;
>     overflow <= '0';
>else
>     modulo_adder <= modulo_adder + 869 - 1600; 
>     overflow <= '1';
>end if;
>-------------------------------

Where the 869 or 1600 are variable your code requires two adders and a 2
way mux. 

Jonathan's code avoids adding the increment on overflow cycles and makes up
by adding double the increment in the cycle following overflow. That
requires 1 adder and a 3 way mux. It won't work when the increment is more
than half the modulus. 

Well known trick or not I have no idea. 
--

Reply by Mike Treseler ●February 8, 20092009-02-08

Jonathan Bromley wrote:

> It simulates correctly too, although of course it needs to 
> be wrapped in something that provides a signal to connect
> to its unconstrained input port. 

This may be the point that Antti missed,
as he mentioned "unmodified" code.
Without a wrapper or constraint, I wouldn't expect
elaboration to succeed:

# vsim -c rate_gen
# Loading /flip/usr1/modeltech/linux/../std.standard
# Loading /flip/usr1/modeltech/linux/../ieee.std_logic_1164(body)
# Loading /flip/usr1/modeltech/linux/../ieee.numeric_std(body)
# Loading work.rate_gen(rtl)#1
# ** Fatal: (vsim-3347) Port 'rate' is not constrained.

A minimal constraining modification like this:
 rate  : in  unsigned(15 downto 0)
proves that the code would sim ok, with a proper instance.

# vsim -c rate_gen
# Loading /flip/usr1/modeltech/linux/../std.standard
# Loading /flip/usr1/modeltech/linux/../ieee.std_logic_1164(body)
# Loading /flip/usr1/modeltech/linux/../ieee.numeric_std(body)
# Loading work.rate_gen(rtl)#1
VSIM 1> run
VSIM 2>

 -- Mike Treseler

Reply by Hal Murray ●February 8, 20092009-02-08

>  if (...overflow...) then
>    modulo_adder <= modulo_adder + N;
>  else
>    modulo_adder <= modulo_adder + N - 1600;
>
>The last line requires TWO adders, in addition to the
>multiplexer created by the IF.  This causes a significant
>performance hit.  That's what I was trying to fix.  I did
>it by saying...


>So we modify the code like this:
>
>  if (...overflow...) then
>    adder <= adder + N;
>  elsif (...there was an overflow on the previous clock...)
>    adder <= adder + 2*N;  -- make up for the lost N
>  else
>    adder <= adder - 1600;

It looks like a 2 input mux with 2 adders vs
a 3 input mux with 1 adder.

The second adder can be pulled out and done ahead of time, aka pipelined.
If speed rather than space is your goal, that might be better.

-- 
These are my opinions, not necessarily my employer's.  I hate spam.

Reply by Nico Coesel ●February 8, 20092009-02-08

Jonathan Bromley <jonathan.bromley@MYCOMPANY.com> wrote:

>hi comp.arch.fpga,
>(accidentally posted to comp.lang.vhdl 
>a few moments ago- sorry)
>
>The question - repeated after the explanation - 
>is: here's what I think is a nifty trick; has
>anyone seen it, or been aware of it, before?
>I can't believe it's really new.
>
>I have been messing around with baud rate generators
>and suchlike - creating a pulse that's active for
>one clock period at some required repetition rate -
>and wanted to try a phase accumulator technique 
>instead of a simple divider.  That makes it far 
>easier to specify the frequency - it's simply the
>phase-delta input value - and easily allows for
>non-integral divide ratios, at the cost of one
>master clock period of jitter.
>
>The phase-accumulator produces pulses with a
>repetition rate of 
>  Fc * M / N 
>where Fc is the master clock, M is the phase delta
>and N is the counter's modulus.  However, to get
>the huge convenience of specifying M as the required
>frequency, I must make N be equal to the frequency
>of Fc, and this is unlikely to be an exact power of 2.
>So the phase accumulator works like this:
>
>  on every clock pulse...
>    if (acc < 0) then
>      add := acc + N;
>      output_pulse <= '1';
>    else
>      output_pulse <= '0';
>    end if;
>    acc := acc - M;  -- unconditionally

What if you simply add N-M to the accumulator?

  on every clock pulse...
    if (acc < 0) then
      acc := acc + (N -M);
      output_pulse <= '1';
    else
      output_pulse <= '0';
      acc := acc - M;
    end if;


-- 
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
                     "If it doesn't fit, use a bigger hammer!"
--------------------------------------------------------------

Reply by Glen Herrmannsfeldt ●February 8, 20092009-02-08

Jonathan Bromley wrote:
(snip)

> Yes, exactly.  But if "869" is a variable (from an
> input port) so that the rate is configurable, then
> you have

>   if (...overflow...) then
>     modulo_adder <= modulo_adder + N;
>   else
>     modulo_adder <= modulo_adder + N - 1600;

> The last line requires TWO adders, in addition to the
> multiplexer created by the IF.  This causes a significant
> performance hit.  That's what I was trying to fix. 

The adders will run in parallel, so there should be no
performance difference.

-- glen

Reply by Muzaffer Kal ●February 8, 20092009-02-08

On Sun, 08 Feb 2009 17:37:18 -0700, Glen Herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:

>Jonathan Bromley wrote:
>(snip)
>
>> Yes, exactly.  But if "869" is a variable (from an
>> input port) so that the rate is configurable, then
>> you have
>
>>   if (...overflow...) then
>>     modulo_adder <= modulo_adder + N;
>>   else
>>     modulo_adder <= modulo_adder + N - 1600;
>
>> The last line requires TWO adders, in addition to the
>> multiplexer created by the IF.  This causes a significant
>> performance hit.  That's what I was trying to fix. 
>
>The adders will run in parallel, so there should be no
>performance difference.

I don't think they can. The first line needs a 2 input adder and the
second line needs two 2 input adders in a tree or a 3 input adder. No
matter how you look at it, that path will be slower than a 2 input
adder; whether slow enough to matter is a different issue.
More precisely, there are two paths 
1) module_adder.Q to module_adder.D  and
2) N.Q to module_adder.D

The second path here will have some extra gates on it because before
N.Q can be presented to the input of the second (m_a+N) adder, it has
to go through one more adder to calculate N.Q -1600, which is the
extra delay.
-- Muzaffer Kal

DSPIA INC.
ASIC/FPGA Design Services
http://www.dspia.com

Reply by Joseph H Allen ●February 9, 20092009-02-09

I remember implementing Bresenham's line drawing algorithm on 8-bit CPUs
like this:

	suba	#N	* Subtract N
	bcc	no_ov	* Branch if no borrow...
	adda	#M	* Add M
	inc	cur_y	* increment y axis or whatever...
no_ov

(bonus points if you can identify the CPU and the assembler...)

In Verilog this would be something like:

	reg [16:0] accu;

	if (accu[16]) // Did previous (- N) overflow?
	  accu <= accu + M; // Yes, add M
	else
	  accu <= accu - N;


-- 
/*  jhallen@world.std.com AB1GO */                        /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

Reply by Glen Herrmannsfeldt ●February 9, 20092009-02-09

Muzaffer Kal wrote:

(after someone wrote)
>>>  if (...overflow...) then
>>>    modulo_adder <= modulo_adder + N;
>>>  else
>>>    modulo_adder <= modulo_adder + N - 1600;

>>>The last line requires TWO adders, in addition to the
>>>multiplexer created by the IF.  This causes a significant
>>>performance hit.  That's what I was trying to fix. 

(I wrote)
>>The adders will run in parallel, so there should be no
>>performance difference.

> I don't think they can. The first line needs a 2 input adder and the
> second line needs two 2 input adders in a tree or a 3 input adder. No
> matter how you look at it, that path will be slower than a 2 input
> adder; whether slow enough to matter is a different issue.

I thought N was more or less constant.  It should probably be

        modulo_adder <= modulo_adder + (N - 1600);

to make sure it comes out that way, though.

In any case, it should run at least at 100MHz even with two adders
on most current FPGAs.

(snip)

-- glen

Reply by Antti ●February 9, 20092009-02-09

On Feb 9, 1:12=A0am, Mike Treseler <mtrese...@gmail.com> wrote:
> Jonathan Bromley wrote:
> > It simulates correctly too, although of course it needs to
> > be wrapped in something that provides a signal to connect
> > to its unconstrained input port.
>
> This may be the point that Antti missed,
> as he mentioned "unmodified" code.
> Without a wrapper or constraint, I wouldn't expect
> elaboration to succeed:
>
> # vsim -c rate_gen
> # Loading /flip/usr1/modeltech/linux/../std.standard
> # Loading /flip/usr1/modeltech/linux/../ieee.std_logic_1164(body)
> # Loading /flip/usr1/modeltech/linux/../ieee.numeric_std(body)
> # Loading work.rate_gen(rtl)#1
> # ** Fatal: (vsim-3347) Port 'rate' is not constrained.
>
> A minimal constraining modification like this:
> =A0rate =A0: in =A0unsigned(15 downto 0)
> proves that the code would sim ok, with a proper instance.
>
> # vsim -c rate_gen
> # Loading /flip/usr1/modeltech/linux/../std.standard
> # Loading /flip/usr1/modeltech/linux/../ieee.std_logic_1164(body)
> # Loading /flip/usr1/modeltech/linux/../ieee.numeric_std(body)
> # Loading work.rate_gen(rtl)#1
> VSIM 1> run
> VSIM 2>
>
> =A0-- Mike Treseler

Mike I am not that dumb ;)

the original code does NOT pass synthesis without wrapper
after passing the synthesis with wrapper it does not pass
compile for Xilinx ISIM
the same code may pass __all__ others simulator,
i made no statement about an of them, only that
code making xilinx XST happy doest not pass ISIM
so those tools VHDL support level is just different.

i constrained the signal connected to rate in the wrapper as:

signal rate : unsigned(15 downto 0) :=3D X"2580";

this was the _needed_ trick in the wrapper.
i assumed Jonathans code does not modifications.
what he claims the case if the tools are fully VHDL-93 compliant

Antti

Previous 123 4 5 Next

Is this phase accumulator trick well-known???

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group