comp.arch.fpga | Is this phase accumulator trick well-known???| page 3

Reply by rickman ●February 9, 20092009-02-09

On Feb 9, 12:29=A0am, Antti <Antti.Luk...@googlemail.com> wrote:
> On Feb 9, 1:12=A0am, Mike Treseler <mtrese...@gmail.com> wrote:
>
>
>
> > Jonathan Bromley wrote:
> > > It simulates correctly too, although of course it needs to
> > > be wrapped in something that provides a signal to connect
> > > to its unconstrained input port.
>
> > This may be the point that Antti missed,
> > as he mentioned "unmodified" code.
> > Without a wrapper or constraint, I wouldn't expect
> > elaboration to succeed:
>
> > # vsim -c rate_gen
> > # Loading /flip/usr1/modeltech/linux/../std.standard
> > # Loading /flip/usr1/modeltech/linux/../ieee.std_logic_1164(body)
> > # Loading /flip/usr1/modeltech/linux/../ieee.numeric_std(body)
> > # Loading work.rate_gen(rtl)#1
> > # ** Fatal: (vsim-3347) Port 'rate' is not constrained.
>
> > A minimal constraining modification like this:
> > =A0rate =A0: in =A0unsigned(15 downto 0)
> > proves that the code would sim ok, with a proper instance.
>
> > # vsim -c rate_gen
> > # Loading /flip/usr1/modeltech/linux/../std.standard
> > # Loading /flip/usr1/modeltech/linux/../ieee.std_logic_1164(body)
> > # Loading /flip/usr1/modeltech/linux/../ieee.numeric_std(body)
> > # Loading work.rate_gen(rtl)#1
> > VSIM 1> run
> > VSIM 2>
>
> > =A0-- Mike Treseler
>
> Mike I am not that dumb ;)
>
> the original code does NOT pass synthesis without wrapper
> after passing the synthesis with wrapper it does not pass
> compile for Xilinx ISIM
> the same code may pass __all__ others simulator,
> i made no statement about an of them, only that
> code making xilinx XST happy doest not pass ISIM
> so those tools VHDL support level is just different.
>
> i constrained the signal connected to rate in the wrapper as:
>
> signal rate : unsigned(15 downto 0) :=3D X"2580";
>
> this was the _needed_ trick in the wrapper.
> i assumed Jonathans code does not modifications.
> what he claims the case if the tools are fully VHDL-93 compliant
>
> Antti

ISIM is known to be very buggy compared to more mature tools.  Xilinx
will even admit that and they ask for all input about things code that
makes it fail.

I would not put much importance on the fact that any code does not
work correctly in ISIM.

Rick

Reply by rickman ●February 9, 20092009-02-09

On Feb 8, 4:32=A0pm, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> On Sun, 8 Feb 2009 10:27:08 -0800 (PST), Antti wrote:
> >ok, now i KNOW too
>
> Aw, shucks, you found me out. =A0Sitting in my futuristic
> bunker in a hollowed-out volcano, stroking my fluffy
> white cat and letting out the occasional megalomaniac
> cackle, I was thinking: BWAH HAH HAH At last I have
> created something that could confuse even the great
> Antti. =A0But it was not to be so :-)
>
> >xilinx synthesis: 50mhz ref 9600 baud (fixed not variable), s3
> >FF:18
> >LUT 17
>
> Yup. =A0With a constant rate input, XST does The Right Thing and
> collapses all the constants. =A0With a variable input, you get
> approximately 2X LUT-to-FF ratio because you can't fit a
> 3-input MUX and an adder bit into a single LUT (at least,
> not in Spartan-3... maybe when we have LUT6??? .....)
>
> >used Jons unmodifed code :)
>
> It simulates correctly too, although of course it needs to
> be wrapped in something that provides a signal to connect
> to its unconstrained input port. =A0I haven't tried it with
> ISIM just yet, but I believe the code is strictly VHDL-93
> compliant and therefore it is NOT MY FAULT if some simulator
> cannot handle it. =A0By contrast, if you can find a synthesis
> tool that can't handle it then I'd be glad to know, because
> my intent was to create portable synthesisable code.

I think I understand everything that has been posted here.  I
understand that your approach may be an improvement.  But I am not
sure it should be a *LOT* faster than two adders.  Using 4 input LUTs,
an adder is no more or less logic than a mux other than the carry
chain.  Two cascaded adders will have the carry chains running in
parallel other than the extra LUT delay.  This LUT delay is no more
than what you would see adding an extra MUX.  In fact, I can picture a
hardware implementation using a single 2 input mux and two adders
where one of the adders is not in the timing loop.

The calculation is Acc <=3D Acc - (N muxed with N-M).  In the N-M
calculation both N and M are variable, but the variables are not
updated in the loop, so for the purposes of calculating clock timing,
this adder should not be part of the delay (unless you are
implementing chirp or spread spectrum).  Perhaps you need to specify
the path from the N and M registers through the N-M adder is not
important.  That should give you as good or better timing than the 3
input mux approach and does not limit the output frequency as much.
In the newer families with a 6 input LUT, the timing critical parts
will all fit in a single LUT.  I don't think that is true if you use a
3 input mux because both select signals have to update on a clock by
clock basis and none of it can be separated into another LUT without
impacting performance.

This is an interesting problem, am I understanding it correctly?

Rick

Reply by Muzaffer Kal ●February 9, 20092009-02-09

On Sun, 08 Feb 2009 21:47:05 -0700, Glen Herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:

>Muzaffer Kal wrote:
>
>(after someone wrote)
>>>>  if (...overflow...) then
>>>>    modulo_adder <= modulo_adder + N;
>>>>  else
>>>>    modulo_adder <= modulo_adder + N - 1600;
>
>>>>The last line requires TWO adders, in addition to the
>>>>multiplexer created by the IF.  This causes a significant
>>>>performance hit.  That's what I was trying to fix. 
>
>(I wrote)
>>>The adders will run in parallel, so there should be no
>>>performance difference.
>
>> I don't think they can. The first line needs a 2 input adder and the
>> second line needs two 2 input adders in a tree or a 3 input adder. No
>> matter how you look at it, that path will be slower than a 2 input
>> adder; whether slow enough to matter is a different issue.
>
>I thought N was more or less constant.  

I'm not sure what that means. N is either constant in which case
"N-1600" is a constant and there is no reason to hold this discussion
(and earlier in the thread it was mentioned that N was variable to
make the generator programmable) or it is variable in which case one
needs a real adder to deal with it. (or it maybe a very slowly
changing number in which case one can use some multi-cycle constraints
but I don't think that's the topic now).

> It should probably be
>
>        modulo_adder <= modulo_adder + (N - 1600);
>
>to make sure it comes out that way, though.
>

You pay your dues either way. If N is variable you either do 
"(m_a +N) - 1600 " where you incur the cost of a 2 input adder with
variable inputs and another 2 input adder where one input is fixed
(constant) or you calculate:
"m_a + (N-1600)" where you add a constant to N and feed the result to
another 2 input adder where both inputs are variable.
In either case the path from N.Q to m_a.D is longer than it would be
without the third addend.

>In any case, it should run at least at 100MHz even with two adders
>on most current FPGAs.

Whether that's adequate or relevant really depends on what one is
doing.
-- Muzaffer Kal

DSPIA INC.
ASIC/FPGA Design Services
http://www.dspia.com

Reply by Eric Smith ●February 9, 20092009-02-09

Joseph H Allen writes:
> I remember implementing Bresenham's line drawing algorithm on 8-bit CPUs
> like this:
>
> 	suba	#N	* Subtract N
> 	bcc	no_ov	* Branch if no borrow...
> 	adda	#M	* Add M
> 	inc	cur_y	* increment y axis or whatever...
> no_ov
>
> (bonus points if you can identify the CPU and the assembler...)

Looks like MC6800/6802/6808, MC6801, MC6809, MC6809, or MC68HC11 families to me,
or the Hitachi HD6301 family.

Most of my experience with 8-bit parts from Motorola was with the MC68HC05,
MC6809, and MC68HC11 families, but IIRC the HC05 only has a single accumulator
so I don't recall using suba and adda on it.

Reply by Jonathan Bromley ●February 9, 20092009-02-09

On Sun, 8 Feb 2009 22:42:04 -0800 (PST), rickman wrote:

>This is an interesting problem, am I understanding it correctly?

Yes; more correctly than I did at first, I think.

Various people have correctly pointed out that the N-M 
calculation does not need to be on a timing arc, but it's
tough to convince the tools of that.

Other people have correctly pointed out that my trick
to convert 2 adders and a 2-in MUX into one adder and
a 3-in MUX does not save any area.  I did consistently
find, however, that it gave significantly better Fmax;
I'm not 100% sure I know why.  If we have 6-input LUTs
then my trick would be a very big win.

Finally, someone pointed out that the N-M calculation
could be pipelined.  In FPGAs, with one FF per LUT
whether you use it or not, that turns out better than
any other form I've tried.  Better still, if N and M 
are both constants then the tools correctly identify
that the (N-M) pipeline register is constant, and 
optimize it away.  So my original question, and my
original "trick", become irrelevant (except in 
Spartan-6, maybe???!!!) and my "best effort" is:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity rate_gen is
  generic ( ref_Hz: positive := 50_000_000 );
  port
    ( clock : in  std_logic
    ; reset : in  std_logic
    ; rate  : in  unsigned
    ; pulse : out std_logic
    );
end;

architecture RTL of rate_gen is

begin
  process (clock)
    variable count: integer range -2**rate'length to ref_Hz-1 := 0;
    variable wrap: natural range 0 to ref_Hz := ref_Hz;
  begin
    if rising_edge(clock) then
      pulse <= '0';
      if reset = '1' then
        count := 0;
      elsif count < 0 then
        pulse <= '1';
        count := count + wrap;
      else
        count := count - to_integer(rate);
      end if;
      wrap := ref_Hz - to_integer(rate);
    end if;
  end process;
end;

The synchronous reset adds a tiny amount of delay (routing???)
and is probably unnecessary.

But there's another idea coming...
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.

Reply by Jonathan Bromley ●February 9, 20092009-02-09

On Mon, 09 Feb 2009 10:14:09 +0000, Jonathan Bromley wrote:

>But there's another idea coming...

which is to time-division mux the two additions.
This degrades the jitter to 2 master clock periods,
but gives what I believe to be the most compact
and fastest possible implementation for a phase
accumulator whose modulus is not a power of 2.
I removed the reset because it's fairly useless.

As with the earlier implementation, this one
can only provide output rates up to Fc/2.

  library ieee;
  use ieee.std_logic_1164.all;
  use ieee.numeric_std.all;

  entity rate_gen is
    generic ( ref_Hz: positive := 50_000_000 );
    port
      ( clock : in  std_logic
      ; rate  : in  unsigned
      ; pulse : out std_logic
      );
  end;

  architecture RTL_2ph of rate_gen is
  begin
    process (clock)
      -- Halve the modulus to account for 2-phase operation
      constant modulus: integer := ref_Hz/2;
      -- This flag controls the adder multiplexing
      variable phase: boolean;
      variable count: integer range -2**rate'length to modulus-1 := 0;
    begin
      if rising_edge(clock) then
        pulse <= '0';
        if phase then
          count := count - to_integer(rate);
        elsif count < 0 then
          count := count + modulus;
          pulse <= '1';
        end if;
        phase := not phase;
      end if;
    end process;

  end;

Thanks for all the comments.
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.

Reply by ●February 9, 20092009-02-09

On Feb 9, 5:57=A0am, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> On Mon, 09 Feb 2009 10:14:09 +0000, Jonathan Bromley wrote:
> >But there's another idea coming...
>
> which is to time-division mux the two additions.
> This degrades the jitter to 2 master clock periods,
> but gives what I believe to be the most compact
> and fastest possible implementation for a phase
> accumulator whose modulus is not a power of 2.
> I removed the reset because it's fairly useless.
>
> As with the earlier implementation, this one
> can only provide output rates up to Fc/2.
>
> =A0 library ieee;
> =A0 use ieee.std_logic_1164.all;
> =A0 use ieee.numeric_std.all;
>
> =A0 entity rate_gen is
> =A0 =A0 generic ( ref_Hz: positive :=3D 50_000_000 );
> =A0 =A0 port
> =A0 =A0 =A0 ( clock : in =A0std_logic
> =A0 =A0 =A0 ; rate =A0: in =A0unsigned
> =A0 =A0 =A0 ; pulse : out std_logic
> =A0 =A0 =A0 );
> =A0 end;
>
> =A0 architecture RTL_2ph of rate_gen is
> =A0 begin
> =A0 =A0 process (clock)
> =A0 =A0 =A0 -- Halve the modulus to account for 2-phase operation
> =A0 =A0 =A0 constant modulus: integer :=3D ref_Hz/2;
> =A0 =A0 =A0 -- This flag controls the adder multiplexing
> =A0 =A0 =A0 variable phase: boolean;
> =A0 =A0 =A0 variable count: integer range -2**rate'length to modulus-1 :=
=3D 0;
> =A0 =A0 begin
> =A0 =A0 =A0 if rising_edge(clock) then
> =A0 =A0 =A0 =A0 pulse <=3D '0';
> =A0 =A0 =A0 =A0 if phase then
> =A0 =A0 =A0 =A0 =A0 count :=3D count - to_integer(rate);
> =A0 =A0 =A0 =A0 elsif count < 0 then
> =A0 =A0 =A0 =A0 =A0 count :=3D count + modulus;
> =A0 =A0 =A0 =A0 =A0 pulse <=3D '1';
> =A0 =A0 =A0 =A0 end if;
> =A0 =A0 =A0 =A0 phase :=3D not phase;
> =A0 =A0 =A0 end if;
> =A0 =A0 end process;
>
> =A0 end;
>
> Thanks for all the comments.
> --
> Jonathan Bromley, Consultant
>
> DOULOS - Developing Design Know-how
> VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services
>
> Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
> jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com
>
> The contents of this message may contain personal views which
> are not the views of Doulos Ltd., unless specifically stated.

Jonathan,

I was intrigued by your first posting, but couldn't get around to
thinking about it more until today.  I have not seen that particular
trick before - you're basically just trying to increase the bit-level
correlation between two of the adder inputs (M and 2M) in hopes that
the synthesizer can make good use of the ensuing simplifications that
fall out. It's very nice conceptually, but there's always the risk
that the synthesizer can't see your idea.

Your two-phase idea is also slick. When doing numerical work, one
always has the sense that by serializing the operations, one could use
less logic and create a more compact and elegant solution.  This two-
phase concept seems to find the sweet spot between generalized state-
machine-based serial operations and the brute force all-in-one clock
cycle version.  I haven't tried synthesizing it yet but I'm inclined
to steal this for my next design if it's really as clean as it looks.

 - Kenn

Reply by Mike Treseler ●February 9, 20092009-02-09

Antti wrote:

> Mike I am not that dumb ;)

I'm the dumb one.
I had to do that exercise to figure out what the discussion was about.
I'm a package guy, not a wrapper guy ;)

        -- Mike Treseler

Reply by Brian Drummond ●February 9, 20092009-02-09

On Sun, 08 Feb 2009 21:32:40 +0000, Jonathan Bromley
<jonathan.bromley@MYCOMPANY.com> wrote:

>On Sun, 8 Feb 2009 10:27:08 -0800 (PST), Antti wrote:
>
>>ok, now i KNOW too

>>used Jons unmodifed code :)
>
>It simulates correctly too, although of course it needs to 
>be wrapped in something that provides a signal to connect
>to its unconstrained input port.  I haven't tried it with
>ISIM just yet, but I believe the code is strictly VHDL-93
>compliant and therefore it is NOT MY FAULT if some simulator
>cannot handle it.  

Well ISIM certainly has trouble with a few VHDL constructs, so I was curious
enough to try and find the problem here. I feel Xilinx sincerely want to make it
work properly, so at one point I had 8 Webcases open, most of them reporting
ISIM defects. Even 10.1.03 is significantly better than 10.1.01.

Taking the heavy hints to build a wrapper entity, I let ISE do its "new source
... VHDL Module" thing (and what exactly is a VHDL Module?), added a direct
entity instantiation, and stole Antti's intermediate signal from
>i constrained the signal connected to rate in the wrapper as:
>signal rate : unsigned(15 downto 0) := X"2580";
and tried synthesis...

ERROR:HDLParsers:800 - "/home/brian/projects/play/baudrate/baudgen.vhd" Line 50.
	Type of rate is incompatible with type of rate.

That was easily fixed...
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
--use IEEE.STD_LOGIC_ARITH.ALL;
--use IEEE.STD_LOGIC_UNSIGNED.ALL;
use ieee.numeric_std.all;
... but grrrrrrr....
then synthesis was as expected.

To try ISIM...
1) "New Source - VHDL Testbench" and associate the DUT with the wrapper entity.
2) I made three trivial changes to the testbench ... 
(a) comment out "USE ieee.std_logic_unsigned.all;"
(oddly enough, numeric_std was already there!)
(b) set the clock frequency 
    constant clock_period : time := 20ns; -- was 1 us
(c) change the startup delay
    wait for 100ns; -- was 100ms
This would have made more difference if I had remembered to drive the Reset
signal, but never mind...

ISIM reported
ERROR:HDLCompiler:69 - "baudgen.vhd" Line 47. rate_gen is not declared

at which point I changed the instantiation of Jonathan's entity in my wrapper,
to specify the "work" library.
Gen: entity rate_gen
to
Gen: entity work.rate_gen

and ISIM simply worked.

I don't know if this (requirement to specify the "work" library) is an ISIM
defect, but I think it's a lesser problem than the rest of ISE, including XST,
has with VHDL libraries - ignoring them altogether. 

Which means you can use components with the same name in different libraries,
and the simulator (ISIM included) will pick the one you specify, but XST will
simply pick whichever it feels like, ignoring your library/use/embedded
configurations. (That's supposed to be fixed in ISE11. We'll see)

ISIM still has problems though:
(1) the wave viewer won't show a 20ns pulse on a 1ms window, though the "find
next edge" button and zooming in shows they are really there...

(2) Increasing the baud rate 
signal rate : unsigned(15 downto 0) := X"25800";
ISIM spotted the deliberate mistake and an error message flashed past,
but then it ran the out of date sim anyway (which is a VERY bad design decision
IMO)

But fixing the mistake 
signal rate : unsigned(19 downto 0) := X"25800";
and simulating for 100us, all was well.

Antti, did you have a different problem?

- Brian

Reply by ●February 9, 20092009-02-09

On Feb 9, 11:19=A0am, Brian Drummond <brian_drumm...@btconnect.com>
wrote:
> On Sun, 08 Feb 2009 21:32:40 +0000, Jonathan Bromley
>
> <jonathan.brom...@MYCOMPANY.com> wrote:
> >On Sun, 8 Feb 2009 10:27:08 -0800 (PST), Antti wrote:
>
> >>ok, now i KNOW too
> >>used Jons unmodifed code :)
>
> >It simulates correctly too, although of course it needs to
> >be wrapped in something that provides a signal to connect
> >to its unconstrained input port. =A0I haven't tried it with
> >ISIM just yet, but I believe the code is strictly VHDL-93
> >compliant and therefore it is NOT MY FAULT if some simulator
> >cannot handle it. =A0
>
> Well ISIM certainly has trouble with a few VHDL constructs, so I was curi=
ous
> enough to try and find the problem here. I feel Xilinx sincerely want to =
make it
> work properly, so at one point I had 8 Webcases open, most of them report=
ing
> ISIM defects. Even 10.1.03 is significantly better than 10.1.01.
>
> Taking the heavy hints to build a wrapper entity, I let ISE do its "new s=
ource
> ... VHDL Module" thing (and what exactly is a VHDL Module?), added a dire=
ct
> entity instantiation, and stole Antti's intermediate signal from>i constr=
ained the signal connected to rate in the wrapper as:
> >signal rate : unsigned(15 downto 0) :=3D X"2580";
>
> and tried synthesis...
>
> ERROR:HDLParsers:800 - "/home/brian/projects/play/baudrate/baudgen.vhd" L=
ine 50.
> =A0 =A0 =A0 =A0 Type of rate is incompatible with type of rate.
>
> That was easily fixed...
> library IEEE;
> use IEEE.STD_LOGIC_1164.ALL;
> --use IEEE.STD_LOGIC_ARITH.ALL;
> --use IEEE.STD_LOGIC_UNSIGNED.ALL;
> use ieee.numeric_std.all;
> ... but grrrrrrr....
> then synthesis was as expected.
>
> To try ISIM...
> 1) "New Source - VHDL Testbench" and associate the DUT with the wrapper e=
ntity.
> 2) I made three trivial changes to the testbench ...
> (a) comment out "USE ieee.std_logic_unsigned.all;"
> (oddly enough, numeric_std was already there!)
> (b) set the clock frequency
> =A0 =A0 constant clock_period : time :=3D 20ns; -- was 1 us
> (c) change the startup delay
> =A0 =A0 wait for 100ns; -- was 100ms
> This would have made more difference if I had remembered to drive the Res=
et
> signal, but never mind...
>
> ISIM reported
> ERROR:HDLCompiler:69 - "baudgen.vhd" Line 47. rate_gen is not declared
>
> at which point I changed the instantiation of Jonathan's entity in my wra=
pper,
> to specify the "work" library.
> Gen: entity rate_gen
> to
> Gen: entity work.rate_gen
>
> and ISIM simply worked.
>
> I don't know if this (requirement to specify the "work" library) is an IS=
IM
> defect, but I think it's a lesser problem than the rest of ISE, including=
 XST,
> has with VHDL libraries - ignoring them altogether.
>

I believe that the requirement to use the "work" prefix is a
consequence of the name visibility rules in VHDL. The reason you can
usually get away with a non-prefixed simple component instance (as
opposed to an entity) is that you've already made the name visible by
"use.some_component_pkg.all". (UNISIM, VSIM, etc), or else explicitly
defined your component with the rest of your declarations. A "use
work.all" should achieve the same end.

> Which means you can use components with the same name in different librar=
ies,
> and the simulator (ISIM included) will pick the one you specify, but XST =
will
> simply pick whichever it feels like, ignoring your library/use/embedded
> configurations. (That's supposed to be fixed in ISE11. We'll see)
>
> ISIM still has problems though:
> (1) the wave viewer won't show a 20ns pulse on a 1ms window, though the "=
find
> next edge" button and zooming in shows they are really there...
>
> (2) Increasing the baud rate
> signal rate : unsigned(15 downto 0) :=3D X"25800";
> ISIM spotted the deliberate mistake and an error message flashed past,
> but then it ran the out of date sim anyway (which is a VERY bad design de=
cision
> IMO)
>
> But fixing the mistake
> signal rate : unsigned(19 downto 0) :=3D X"25800";
> and simulating for 100us, all was well.
>
> Antti, did you have a different problem?
>
> - Brian

Previous 1 234 5 Next

Is this phase accumulator trick well-known???

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group