On Tue, 10 Feb 2009 17:13:07 GMT, Nico Coesel wrote:


>>...I wanted to deal with the case where M is
>>a variable and in that case we end up with two
>>adders which have to be cascaded somehow.
>
>You don't have to. Just say A=N-M and add A to the pulse accumulator.
>A can be calculated in a seperate process.

Indeed it can, but in what way is that different
from "cascaded somehow"?  As I and others have
already pointed out, you can pipeline the N-M 
addition at zero cost in an FPGA; but you still
need two adders.
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.

Jonathan Bromley <jonathan.bromley@MYCOMPANY.com> wrote:

>On Sun, 08 Feb 2009 23:23:00 GMT, Nico Coesel wrote:
>
>
>>What if you simply add N-M to the accumulator?
>>
>>  on every clock pulse...
>>    if (acc < 0) then
>>      acc := acc + (N -M);
>>      output_pulse <= '1';
>>    else
>>      output_pulse <= '0';
>>      acc := acc - M;
>>    end if;
>
>That is very good if N and M are both constants,
>but I wanted to deal with the case where M is
>a variable and in that case we end up with two
>adders which have to be cascaded somehow.

You don't have to. Just say A=N-M and add A to the pulse accumulator.
A can be calculated in a seperate process.

-- 
Failure does not prove something is impossible, failure simply
indicates you are not using the right tools...
                     "If it doesn't fit, use a bigger hammer!"
--------------------------------------------------------------

On Mon, 9 Feb 2009 16:32:44 -0800 (PST), rickman <gnuarm@gmail.com> wrote:

>> OK, works for 4-input LUTs.
>
>Did you forget 1 + 1 + 1 + carryin = 100 ?

Ouch, incorrectly remembered 3 to 2 counter principle.
Should not write about such stuff late at night.


Gerhard

On Sun, 08 Feb 2009 23:23:00 GMT, Nico Coesel wrote:


>What if you simply add N-M to the accumulator?
>
>  on every clock pulse...
>    if (acc < 0) then
>      acc := acc + (N -M);
>      output_pulse <= '1';
>    else
>      output_pulse <= '0';
>      acc := acc - M;
>    end if;

That is very good if N and M are both constants,
but I wanted to deal with the case where M is
a variable and in that case we end up with two
adders which have to be cascaded somehow.

I revisited the problem for the case where N
and M are both constants, and noted that you
can easily precalculate the greatest common divisor
of N, M and thereby reduce the fraction M/N 
to its lowest terms.  This helps to minimize 
the design without unnecessary human effort,
which rates pretty highly on my lazy man's 
list of desiderata.  Here's the code...

  -- put this in a package, or in the architecture

  entity fixed_rate_gen is
    generic (divisor, multiplier: positive);
    port (clock: in std_logic; pulse: out std_logic);
  end;
  architecture rtl of fixed_rate_gen is
    function euclid_gcd(divisor, multiplier: positive) 
        return positive is
      variable r0, r1, r: natural;
    begin
      assert multiplier <= divisor 
        report "Multiplier is greater than divisor"
        severity failure;
      r0 := multiplier;
      r1 := divisor;
      while r0 /= 0 loop
        r := r1 rem r0;
        r1 := r0;
        r0 := r;
      end loop;
      return r1;
    end;
    constant gcd: positive := euclid_gcd(divisor, multiplier);
    constant m: positive := multiplier/ gcd;
    constant wrap: positive := (divisor / gcd) - m;
  begin
    process (clock)
      variable acc: integer range -m to wrap-1 := 0;
    begin
      if rising_edge(clock) then
        if acc < 0 then
          acc := acc + wrap;
          pulse <= '1';
        else
          acc := acc - m;
          pulse <= '0';
        end if;
      end if;
    end process;
  end;

I love the Euclid GCD algorithm - so neat and 
simple, so non-obvious (to me at least).      
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.

On Mon, 9 Feb 2009 14:13:23 -0800 (PST), Gabor wrote:

>On Feb 8, 12:02&#4294967295;pm, Jonathan Bromley wrote:
>> The question - repeated after the explanation -
>> is: here's what I think is a nifty trick; has
>> anyone seen it, or been aware of it, before?
>> I can't believe it's really new.
[...]

>Did you see this thread on comp.lang.verilog?

http://groups.google.com/group/comp.lang.verilog/browse_frm/thread/7cedbaf9bdd6f1ad?hl=en#

No, I don't recall reading it... looks interesting.
Thanks for the pointer.
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.

On Feb 9, 5:57=A0am, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> On Mon, 09 Feb 2009 10:14:09 +0000, Jonathan Bromley wrote:
> >But there's another idea coming...
>
> which is to time-division mux the two additions.
> This degrades the jitter to 2 master clock periods,
> but gives what I believe to be the most compact
> and fastest possible implementation for a phase
> accumulator whose modulus is not a power of 2.
> I removed the reset because it's fairly useless.
>
> As with the earlier implementation, this one
> can only provide output rates up to Fc/2.
>
> =A0 library ieee;
> =A0 use ieee.std_logic_1164.all;
> =A0 use ieee.numeric_std.all;
>
> =A0 entity rate_gen is
> =A0 =A0 generic ( ref_Hz: positive :=3D 50_000_000 );
> =A0 =A0 port
> =A0 =A0 =A0 ( clock : in =A0std_logic
> =A0 =A0 =A0 ; rate =A0: in =A0unsigned
> =A0 =A0 =A0 ; pulse : out std_logic
> =A0 =A0 =A0 );
> =A0 end;
>
> =A0 architecture RTL_2ph of rate_gen is
> =A0 begin
> =A0 =A0 process (clock)
> =A0 =A0 =A0 -- Halve the modulus to account for 2-phase operation
> =A0 =A0 =A0 constant modulus: integer :=3D ref_Hz/2;
> =A0 =A0 =A0 -- This flag controls the adder multiplexing
> =A0 =A0 =A0 variable phase: boolean;
> =A0 =A0 =A0 variable count: integer range -2**rate'length to modulus-1 :=
=3D 0;
> =A0 =A0 begin
> =A0 =A0 =A0 if rising_edge(clock) then
> =A0 =A0 =A0 =A0 pulse <=3D '0';
> =A0 =A0 =A0 =A0 if phase then
> =A0 =A0 =A0 =A0 =A0 count :=3D count - to_integer(rate);
> =A0 =A0 =A0 =A0 elsif count < 0 then
> =A0 =A0 =A0 =A0 =A0 count :=3D count + modulus;
> =A0 =A0 =A0 =A0 =A0 pulse <=3D '1';
> =A0 =A0 =A0 =A0 end if;
> =A0 =A0 =A0 =A0 phase :=3D not phase;
> =A0 =A0 =A0 end if;
> =A0 =A0 end process;
>
> =A0 end;
>
> Thanks for all the comments.

I don't get how this is smaller or faster than any of the other
approaches.

Rick

On Feb 9, 5:14=A0am, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> On Sun, 8 Feb 2009 22:42:04 -0800 (PST), rickman wrote:
> >This is an interesting problem, am I understanding it correctly?
>
> Yes; more correctly than I did at first, I think.
>
> Various people have correctly pointed out that the N-M
> calculation does not need to be on a timing arc, but it's
> tough to convince the tools of that.

Tough, but not impossible.  There is a way to tell the tools than any
path through a given point has a specified timing.  You need to apply
this to the net which is the output of the adder of N-M.


> Other people have correctly pointed out that my trick
> to convert 2 adders and a 2-in MUX into one adder and
> a 3-in MUX does not save any area. =A0I did consistently
> find, however, that it gave significantly better Fmax;
> I'm not 100% sure I know why. =A0If we have 6-input LUTs
> then my trick would be a very big win.

I agree that the timing should be close between the two examples.  But
adders have to be arranged in a column while the bits of a mux can be
placed anywhere close and will have good timing.  I expect this is the
sort of design that can be helped significantly by floorplanning.


> Finally, someone pointed out that the N-M calculation
> could be pipelined. =A0In FPGAs, with one FF per LUT
> whether you use it or not, that turns out better than
> any other form I've tried. =A0

I'm not sure what is meant by that, but certainly it will not hurt to
add FFs to the output of N-M since they are virtually static for this
application.  By adding FFs here, you will in essence be cutting the
timing path allowing the timing analyzer to see only the portions of
the design that need to be fast.  In fact, you can eliminate the adder
altogether by having the programmed registers hold N and N-M instead
of N and M.  That *will* increase speed as well as reducing
footprint.


> Better still, if N and M
> are both constants then the tools correctly identify
> that the (N-M) pipeline register is constant, and
> optimize it away. =A0So my original question, and my
> original "trick", become irrelevant (except in
> Spartan-6, maybe???!!!) and my "best effort" is:

I don't know what higher level muxes they have in the 6 series of
parts.  A 6 input LUT is still not enough to support a 3 input mux and
a full adder.  The LUT needs 5 inputs for the mux plus one for the
accumulator plus one more for the carry input to each bit.


> library ieee;
> use ieee.std_logic_1164.all;
> use ieee.numeric_std.all;
>
> entity rate_gen is
> =A0 generic ( ref_Hz: positive :=3D 50_000_000 );
> =A0 port
> =A0 =A0 ( clock : in =A0std_logic
> =A0 =A0 ; reset : in =A0std_logic
> =A0 =A0 ; rate =A0: in =A0unsigned
> =A0 =A0 ; pulse : out std_logic
> =A0 =A0 );
> end;
>
> architecture RTL of rate_gen is
>
> begin
> =A0 process (clock)
> =A0 =A0 variable count: integer range -2**rate'length to ref_Hz-1 :=3D 0;
> =A0 =A0 variable wrap: natural range 0 to ref_Hz :=3D ref_Hz;
> =A0 begin
> =A0 =A0 if rising_edge(clock) then
> =A0 =A0 =A0 pulse <=3D '0';
> =A0 =A0 =A0 if reset =3D '1' then
> =A0 =A0 =A0 =A0 count :=3D 0;
> =A0 =A0 =A0 elsif count < 0 then
> =A0 =A0 =A0 =A0 pulse <=3D '1';
> =A0 =A0 =A0 =A0 count :=3D count + wrap;
> =A0 =A0 =A0 else
> =A0 =A0 =A0 =A0 count :=3D count - to_integer(rate);
> =A0 =A0 =A0 end if;
> =A0 =A0 =A0 wrap :=3D ref_Hz - to_integer(rate);
> =A0 =A0 end if;
> =A0 end process;
> end;
>
> The synchronous reset adds a tiny amount of delay (routing???)
> and is probably unnecessary.
>
> But there's another idea coming...

Here is your code with two setup registers.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity rate_gen is
  generic ( ref_Hz: positive :=3D 50_000_000 );
  port
    ( clock : in  std_logic
    ; reset : in  std_logic
    ; rate  : in  unsigned
    ; n_m   : in  unsigned
    ; pulse : out std_logic
    );
end;

architecture RTL of rate_gen is

begin
  process (clock)
    variable count: integer range -2**rate'length to ref_Hz-1 :=3D 0;
  begin
    if rising_edge(clock) then
      pulse <=3D '0';
      if reset =3D '1' then
        count :=3D 0;
      elsif count < 0 then
        pulse <=3D '1';
        count :=3D count + n_m;
      else
        count :=3D count - to_integer(rate);
      end if;
    end if;
  end process;
end;

Rick

On Feb 9, 3:58=A0pm, Gerhard Hoffmann <dk...@hoffmann-hochfrequenz.de>
wrote:
> On Sun, 08 Feb 2009 21:54:45 +0000, Jonathan Bromley <jonathan.brom...@MY=
COMPANY.com> wrote:
> >The last line requires TWO adders, in addition to the
> >multiplexer created by the IF. =A0This causes a significant
> >performance hit. =A0That's what I was trying to fix. =A0I did
> >it by saying...
>
> You can add 3 numbers in the same time as 2 because the maximum
> carry generated at any bit position is still '1'.
> I.e. '1' + '1' + '1' is still '11'.
> OK, works for 4-input LUTs.

Did you forget 1 + 1 + 1 + carryin =3D 100 ?


> 4 numbers will make the carry chain more complicated.
> I have not tried if Virtex carry chains can take advantage of this.
> If yes, the mux should be possible in the same block.

3 numbers make the carry chain more complex.  I'm sure that if it were
practical or even possible to add three numbers using a single 4 LUT,
we would already know about it.

Rick

On Mon, 9 Feb 2009 10:18:27 -0800 (PST), Antti <Antti.Lukats@googlemail.com>
wrote:

>On Feb 9, 8:07&#4294967295;pm, Brian Drummond <brian_drumm...@btconnect.com>
>wrote:

>> >So i did MINIMAL fix to get pass XST
>> >that was not enough to pass with ISIM
>>
>> Interesting because that's exactly the conditions that gave me the XST above ...
>> type mismatch between &#4294967295;IEEE.STD_LOGIC_UNSIGNED (in the wrapper) and numeric_std
>> (in Jonathan's code).
>>
>> So I don't see how you got it through XST...

>Brian
>plese LOOK below, my complete wrapper.
>AS is it passes XST until bit file all ok.

aha! The full code reveals the secret... cunningly exploiting a bug in XST to
make it handle incorrect code!

Because this uses a component declaration, XST can't error out here because it
has to assume you will supply a matching entity at elaboration; if it was to
check the available entities it could at best conclude you hadn't written it
yet, which is completely legal... if you had used an entity as I did (reducing
the work even further) it would have found the error (and did).

So it's at elaboration the error occurred; XST couldn't find a matching entity,
never mind, it used the first entity it could find with the right name and
roughly the right number of ports!
(which, being the masochist I am, I reported for ISE 7.1 and 10.1, and I'm told
it should be fixed in 11)

>but yikes in ISIM

Good for ISIM, this time!

- Brian

On Feb 8, 12:02=A0pm, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> hi comp.arch.fpga,
> (accidentally posted to comp.lang.vhdl
> a few moments ago- sorry)
>
> The question - repeated after the explanation -
> is: here's what I think is a nifty trick; has
> anyone seen it, or been aware of it, before?
> I can't believe it's really new.
>
> I have been messing around with baud rate generators
> and suchlike - creating a pulse that's active for
> one clock period at some required repetition rate -
> and wanted to try a phase accumulator technique
> instead of a simple divider. =A0That makes it far
> easier to specify the frequency - it's simply the
> phase-delta input value - and easily allows for
> non-integral divide ratios, at the cost of one
> master clock period of jitter.
>
> The phase-accumulator produces pulses with a
> repetition rate of
> =A0 Fc * M / N
> where Fc is the master clock, M is the phase delta
> and N is the counter's modulus. =A0However, to get
> the huge convenience of specifying M as the required
> frequency, I must make N be equal to the frequency
> of Fc, and this is unlikely to be an exact power of 2.
> So the phase accumulator works like this:
>
> =A0 on every clock pulse...
> =A0 =A0 if (acc < 0) then
> =A0 =A0 =A0 add :=3D acc + N;
> =A0 =A0 =A0 output_pulse <=3D '1';
> =A0 =A0 else
> =A0 =A0 =A0 output_pulse <=3D '0';
> =A0 =A0 end if;
> =A0 =A0 acc :=3D acc - M; =A0-- unconditionally
>
> This is fine, but it means that on the "wrap-around"
> clock cycle I must add either N-M to the accumulator;
> if either M or N are variable, that costs me another
> adder.
>
> Today I came up with an intriguing (to me) alternative:
> on the wrap-around cycle, add N to the accumulator;
> on the immediately subsequent cycle, add (-2M); on
> all other cycles, add (-M). =A0This is of course rather
> easy to do since 2M is just a left shift. =A0A few
> trial synthesis runs convinced me that it will give
> measurably better performance than the two-adder
> version. =A0VHDL code is appended for anyone who wants
> to play.
>
> My question is: has this trick been published anywhere?
> Or is it something that "those skilled in the art"
> already know about? =A0I haven't seen it before, but that
> simply means I probably haven't looked hard enough.
>
> Thanks!
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~ rate generator using novel wrap-around technique
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> library ieee;
> use ieee.std_logic_1164.all;
> use ieee.numeric_std.all;
>
> entity rate_gen is
> =A0 -- Specify the master clock frequency as a generic.
> =A0 generic ( ref_Hz: positive :=3D 50_000_000 );
> =A0 port
> =A0 =A0 ( clock : in =A0std_logic
> =A0 =A0 ; reset : in =A0std_logic =A0-- Synchronous reset.
> =A0 =A0 ; rate =A0: in =A0unsigned =A0 -- Desired output frequency
> =A0 =A0 ; pulse : out std_logic =A0-- The output pulse train
> =A0 =A0 );
> end;
>
> architecture RTL of rate_gen is
> begin
> =A0 process (clock)
> =A0 =A0 -- variable "count" is the accumulator
> =A0 =A0 variable count: integer range -2**rate'length to ref_Hz-1 :=3D 0;
> =A0 =A0 -- variable "overflow" is the output pulse and wraparound marker
> =A0 =A0 variable overflow: std_logic :=3D '0';
> =A0 begin
> =A0 =A0 if rising_edge(clock) then
> =A0 =A0 =A0 if reset =3D '1' then
> =A0 =A0 =A0 =A0 count :=3D 0;
> =A0 =A0 =A0 =A0 overflow :=3D '0';
> =A0 =A0 =A0 elsif count < 0 then
> =A0 =A0 =A0 =A0 overflow :=3D '1';
> =A0 =A0 =A0 =A0 count :=3D count + ref_Hz;
> =A0 =A0 =A0 elsif overflow =3D '1' then
> =A0 =A0 =A0 =A0 overflow :=3D '0';
> =A0 =A0 =A0 =A0 count :=3D count - (to_integer(rate) * 2);
> =A0 =A0 =A0 else
> =A0 =A0 =A0 =A0 overflow :=3D '0';
> =A0 =A0 =A0 =A0 count :=3D count - to_integer(rate);
> =A0 =A0 =A0 end if;
> =A0 =A0 =A0 pulse <=3D overflow;
> =A0 =A0 end if;
> =A0 end process;
> end;
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> --
> Jonathan Bromley, Consultant
>
> DOULOS - Developing Design Know-how
> VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services
>
> Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
> jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com
>
> The contents of this message may contain personal views which
> are not the views of Doulos Ltd., unless specifically stated.
> --
> Jonathan Bromley, Consultant
>
> DOULOS - Developing Design Know-how
> VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services
>
> Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
> jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com
>
> The contents of this message may contain personal views which
> are not the views of Doulos Ltd., unless specifically stated.

Did you see this thread on comp.lang.verilog?

http://groups.google.com/group/comp.lang.verilog/browse_frm/thread/7cedbaf9=
bdd6f1ad?hl=3Den#