FPGARelated.com
Forums

Is this phase accumulator trick well-known???

Started by Jonathan Bromley February 8, 2009
On Feb 9, 12:29=A0am, Antti <Antti.Luk...@googlemail.com> wrote:
> On Feb 9, 1:12=A0am, Mike Treseler <mtrese...@gmail.com> wrote: > > > > > Jonathan Bromley wrote: > > > It simulates correctly too, although of course it needs to > > > be wrapped in something that provides a signal to connect > > > to its unconstrained input port. > > > This may be the point that Antti missed, > > as he mentioned "unmodified" code. > > Without a wrapper or constraint, I wouldn't expect > > elaboration to succeed: > > > # vsim -c rate_gen > > # Loading /flip/usr1/modeltech/linux/../std.standard > > # Loading /flip/usr1/modeltech/linux/../ieee.std_logic_1164(body) > > # Loading /flip/usr1/modeltech/linux/../ieee.numeric_std(body) > > # Loading work.rate_gen(rtl)#1 > > # ** Fatal: (vsim-3347) Port 'rate' is not constrained. > > > A minimal constraining modification like this: > > =A0rate =A0: in =A0unsigned(15 downto 0) > > proves that the code would sim ok, with a proper instance. > > > # vsim -c rate_gen > > # Loading /flip/usr1/modeltech/linux/../std.standard > > # Loading /flip/usr1/modeltech/linux/../ieee.std_logic_1164(body) > > # Loading /flip/usr1/modeltech/linux/../ieee.numeric_std(body) > > # Loading work.rate_gen(rtl)#1 > > VSIM 1> run > > VSIM 2> > > > =A0-- Mike Treseler > > Mike I am not that dumb ;) > > the original code does NOT pass synthesis without wrapper > after passing the synthesis with wrapper it does not pass > compile for Xilinx ISIM > the same code may pass __all__ others simulator, > i made no statement about an of them, only that > code making xilinx XST happy doest not pass ISIM > so those tools VHDL support level is just different. > > i constrained the signal connected to rate in the wrapper as: > > signal rate : unsigned(15 downto 0) :=3D X"2580"; > > this was the _needed_ trick in the wrapper. > i assumed Jonathans code does not modifications. > what he claims the case if the tools are fully VHDL-93 compliant > > Antti
ISIM is known to be very buggy compared to more mature tools. Xilinx will even admit that and they ask for all input about things code that makes it fail. I would not put much importance on the fact that any code does not work correctly in ISIM. Rick
On Feb 8, 4:32=A0pm, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> On Sun, 8 Feb 2009 10:27:08 -0800 (PST), Antti wrote: > >ok, now i KNOW too > > Aw, shucks, you found me out. =A0Sitting in my futuristic > bunker in a hollowed-out volcano, stroking my fluffy > white cat and letting out the occasional megalomaniac > cackle, I was thinking: BWAH HAH HAH At last I have > created something that could confuse even the great > Antti. =A0But it was not to be so :-) > > >xilinx synthesis: 50mhz ref 9600 baud (fixed not variable), s3 > >FF:18 > >LUT 17 > > Yup. =A0With a constant rate input, XST does The Right Thing and > collapses all the constants. =A0With a variable input, you get > approximately 2X LUT-to-FF ratio because you can't fit a > 3-input MUX and an adder bit into a single LUT (at least, > not in Spartan-3... maybe when we have LUT6??? .....) > > >used Jons unmodifed code :) > > It simulates correctly too, although of course it needs to > be wrapped in something that provides a signal to connect > to its unconstrained input port. =A0I haven't tried it with > ISIM just yet, but I believe the code is strictly VHDL-93 > compliant and therefore it is NOT MY FAULT if some simulator > cannot handle it. =A0By contrast, if you can find a synthesis > tool that can't handle it then I'd be glad to know, because > my intent was to create portable synthesisable code.
I think I understand everything that has been posted here. I understand that your approach may be an improvement. But I am not sure it should be a *LOT* faster than two adders. Using 4 input LUTs, an adder is no more or less logic than a mux other than the carry chain. Two cascaded adders will have the carry chains running in parallel other than the extra LUT delay. This LUT delay is no more than what you would see adding an extra MUX. In fact, I can picture a hardware implementation using a single 2 input mux and two adders where one of the adders is not in the timing loop. The calculation is Acc <=3D Acc - (N muxed with N-M). In the N-M calculation both N and M are variable, but the variables are not updated in the loop, so for the purposes of calculating clock timing, this adder should not be part of the delay (unless you are implementing chirp or spread spectrum). Perhaps you need to specify the path from the N and M registers through the N-M adder is not important. That should give you as good or better timing than the 3 input mux approach and does not limit the output frequency as much. In the newer families with a 6 input LUT, the timing critical parts will all fit in a single LUT. I don't think that is true if you use a 3 input mux because both select signals have to update on a clock by clock basis and none of it can be separated into another LUT without impacting performance. This is an interesting problem, am I understanding it correctly? Rick
On Sun, 08 Feb 2009 21:47:05 -0700, Glen Herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:

>Muzaffer Kal wrote: > >(after someone wrote) >>>> if (...overflow...) then >>>> modulo_adder <= modulo_adder + N; >>>> else >>>> modulo_adder <= modulo_adder + N - 1600; > >>>>The last line requires TWO adders, in addition to the >>>>multiplexer created by the IF. This causes a significant >>>>performance hit. That's what I was trying to fix. > >(I wrote) >>>The adders will run in parallel, so there should be no >>>performance difference. > >> I don't think they can. The first line needs a 2 input adder and the >> second line needs two 2 input adders in a tree or a 3 input adder. No >> matter how you look at it, that path will be slower than a 2 input >> adder; whether slow enough to matter is a different issue. > >I thought N was more or less constant.
I'm not sure what that means. N is either constant in which case "N-1600" is a constant and there is no reason to hold this discussion (and earlier in the thread it was mentioned that N was variable to make the generator programmable) or it is variable in which case one needs a real adder to deal with it. (or it maybe a very slowly changing number in which case one can use some multi-cycle constraints but I don't think that's the topic now).
> It should probably be > > modulo_adder <= modulo_adder + (N - 1600); > >to make sure it comes out that way, though. >
You pay your dues either way. If N is variable you either do "(m_a +N) - 1600 " where you incur the cost of a 2 input adder with variable inputs and another 2 input adder where one input is fixed (constant) or you calculate: "m_a + (N-1600)" where you add a constant to N and feed the result to another 2 input adder where both inputs are variable. In either case the path from N.Q to m_a.D is longer than it would be without the third addend.
>In any case, it should run at least at 100MHz even with two adders >on most current FPGAs.
Whether that's adequate or relevant really depends on what one is doing. -- Muzaffer Kal DSPIA INC. ASIC/FPGA Design Services http://www.dspia.com
Joseph H Allen writes:
> I remember implementing Bresenham's line drawing algorithm on 8-bit CPUs > like this: > > suba #N * Subtract N > bcc no_ov * Branch if no borrow... > adda #M * Add M > inc cur_y * increment y axis or whatever... > no_ov > > (bonus points if you can identify the CPU and the assembler...)
Looks like MC6800/6802/6808, MC6801, MC6809, MC6809, or MC68HC11 families to me, or the Hitachi HD6301 family. Most of my experience with 8-bit parts from Motorola was with the MC68HC05, MC6809, and MC68HC11 families, but IIRC the HC05 only has a single accumulator so I don't recall using suba and adda on it.
On Sun, 8 Feb 2009 22:42:04 -0800 (PST), rickman wrote:

>This is an interesting problem, am I understanding it correctly?
Yes; more correctly than I did at first, I think. Various people have correctly pointed out that the N-M calculation does not need to be on a timing arc, but it's tough to convince the tools of that. Other people have correctly pointed out that my trick to convert 2 adders and a 2-in MUX into one adder and a 3-in MUX does not save any area. I did consistently find, however, that it gave significantly better Fmax; I'm not 100% sure I know why. If we have 6-input LUTs then my trick would be a very big win. Finally, someone pointed out that the N-M calculation could be pipelined. In FPGAs, with one FF per LUT whether you use it or not, that turns out better than any other form I've tried. Better still, if N and M are both constants then the tools correctly identify that the (N-M) pipeline register is constant, and optimize it away. So my original question, and my original "trick", become irrelevant (except in Spartan-6, maybe???!!!) and my "best effort" is: library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; entity rate_gen is generic ( ref_Hz: positive := 50_000_000 ); port ( clock : in std_logic ; reset : in std_logic ; rate : in unsigned ; pulse : out std_logic ); end; architecture RTL of rate_gen is begin process (clock) variable count: integer range -2**rate'length to ref_Hz-1 := 0; variable wrap: natural range 0 to ref_Hz := ref_Hz; begin if rising_edge(clock) then pulse <= '0'; if reset = '1' then count := 0; elsif count < 0 then pulse <= '1'; count := count + wrap; else count := count - to_integer(rate); end if; wrap := ref_Hz - to_integer(rate); end if; end process; end; The synchronous reset adds a tiny amount of delay (routing???) and is probably unnecessary. But there's another idea coming... -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
On Mon, 09 Feb 2009 10:14:09 +0000, Jonathan Bromley wrote:

>But there's another idea coming...
which is to time-division mux the two additions. This degrades the jitter to 2 master clock periods, but gives what I believe to be the most compact and fastest possible implementation for a phase accumulator whose modulus is not a power of 2. I removed the reset because it's fairly useless. As with the earlier implementation, this one can only provide output rates up to Fc/2. library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; entity rate_gen is generic ( ref_Hz: positive := 50_000_000 ); port ( clock : in std_logic ; rate : in unsigned ; pulse : out std_logic ); end; architecture RTL_2ph of rate_gen is begin process (clock) -- Halve the modulus to account for 2-phase operation constant modulus: integer := ref_Hz/2; -- This flag controls the adder multiplexing variable phase: boolean; variable count: integer range -2**rate'length to modulus-1 := 0; begin if rising_edge(clock) then pulse <= '0'; if phase then count := count - to_integer(rate); elsif count < 0 then count := count + modulus; pulse <= '1'; end if; phase := not phase; end if; end process; end; Thanks for all the comments. -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
On Feb 9, 5:57=A0am, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> On Mon, 09 Feb 2009 10:14:09 +0000, Jonathan Bromley wrote: > >But there's another idea coming... > > which is to time-division mux the two additions. > This degrades the jitter to 2 master clock periods, > but gives what I believe to be the most compact > and fastest possible implementation for a phase > accumulator whose modulus is not a power of 2. > I removed the reset because it's fairly useless. > > As with the earlier implementation, this one > can only provide output rates up to Fc/2. > > =A0 library ieee; > =A0 use ieee.std_logic_1164.all; > =A0 use ieee.numeric_std.all; > > =A0 entity rate_gen is > =A0 =A0 generic ( ref_Hz: positive :=3D 50_000_000 ); > =A0 =A0 port > =A0 =A0 =A0 ( clock : in =A0std_logic > =A0 =A0 =A0 ; rate =A0: in =A0unsigned > =A0 =A0 =A0 ; pulse : out std_logic > =A0 =A0 =A0 ); > =A0 end; > > =A0 architecture RTL_2ph of rate_gen is > =A0 begin > =A0 =A0 process (clock) > =A0 =A0 =A0 -- Halve the modulus to account for 2-phase operation > =A0 =A0 =A0 constant modulus: integer :=3D ref_Hz/2; > =A0 =A0 =A0 -- This flag controls the adder multiplexing > =A0 =A0 =A0 variable phase: boolean; > =A0 =A0 =A0 variable count: integer range -2**rate'length to modulus-1 :=
=3D 0;
> =A0 =A0 begin > =A0 =A0 =A0 if rising_edge(clock) then > =A0 =A0 =A0 =A0 pulse <=3D '0'; > =A0 =A0 =A0 =A0 if phase then > =A0 =A0 =A0 =A0 =A0 count :=3D count - to_integer(rate); > =A0 =A0 =A0 =A0 elsif count < 0 then > =A0 =A0 =A0 =A0 =A0 count :=3D count + modulus; > =A0 =A0 =A0 =A0 =A0 pulse <=3D '1'; > =A0 =A0 =A0 =A0 end if; > =A0 =A0 =A0 =A0 phase :=3D not phase; > =A0 =A0 =A0 end if; > =A0 =A0 end process; > > =A0 end; > > Thanks for all the comments. > -- > Jonathan Bromley, Consultant > > DOULOS - Developing Design Know-how > VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services > > Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK > jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com > > The contents of this message may contain personal views which > are not the views of Doulos Ltd., unless specifically stated.
Jonathan, I was intrigued by your first posting, but couldn't get around to thinking about it more until today. I have not seen that particular trick before - you're basically just trying to increase the bit-level correlation between two of the adder inputs (M and 2M) in hopes that the synthesizer can make good use of the ensuing simplifications that fall out. It's very nice conceptually, but there's always the risk that the synthesizer can't see your idea. Your two-phase idea is also slick. When doing numerical work, one always has the sense that by serializing the operations, one could use less logic and create a more compact and elegant solution. This two- phase concept seems to find the sweet spot between generalized state- machine-based serial operations and the brute force all-in-one clock cycle version. I haven't tried synthesizing it yet but I'm inclined to steal this for my next design if it's really as clean as it looks. - Kenn
Antti wrote:

> Mike I am not that dumb ;)
I'm the dumb one. I had to do that exercise to figure out what the discussion was about. I'm a package guy, not a wrapper guy ;) -- Mike Treseler
On Sun, 08 Feb 2009 21:32:40 +0000, Jonathan Bromley
<jonathan.bromley@MYCOMPANY.com> wrote:

>On Sun, 8 Feb 2009 10:27:08 -0800 (PST), Antti wrote: > >>ok, now i KNOW too
>>used Jons unmodifed code :) > >It simulates correctly too, although of course it needs to >be wrapped in something that provides a signal to connect >to its unconstrained input port. I haven't tried it with >ISIM just yet, but I believe the code is strictly VHDL-93 >compliant and therefore it is NOT MY FAULT if some simulator >cannot handle it.
Well ISIM certainly has trouble with a few VHDL constructs, so I was curious enough to try and find the problem here. I feel Xilinx sincerely want to make it work properly, so at one point I had 8 Webcases open, most of them reporting ISIM defects. Even 10.1.03 is significantly better than 10.1.01. Taking the heavy hints to build a wrapper entity, I let ISE do its "new source ... VHDL Module" thing (and what exactly is a VHDL Module?), added a direct entity instantiation, and stole Antti's intermediate signal from
>i constrained the signal connected to rate in the wrapper as: >signal rate : unsigned(15 downto 0) := X"2580";
and tried synthesis... ERROR:HDLParsers:800 - "/home/brian/projects/play/baudrate/baudgen.vhd" Line 50. Type of rate is incompatible with type of rate. That was easily fixed... library IEEE; use IEEE.STD_LOGIC_1164.ALL; --use IEEE.STD_LOGIC_ARITH.ALL; --use IEEE.STD_LOGIC_UNSIGNED.ALL; use ieee.numeric_std.all; ... but grrrrrrr.... then synthesis was as expected. To try ISIM... 1) "New Source - VHDL Testbench" and associate the DUT with the wrapper entity. 2) I made three trivial changes to the testbench ... (a) comment out "USE ieee.std_logic_unsigned.all;" (oddly enough, numeric_std was already there!) (b) set the clock frequency constant clock_period : time := 20ns; -- was 1 us (c) change the startup delay wait for 100ns; -- was 100ms This would have made more difference if I had remembered to drive the Reset signal, but never mind... ISIM reported ERROR:HDLCompiler:69 - "baudgen.vhd" Line 47. rate_gen is not declared at which point I changed the instantiation of Jonathan's entity in my wrapper, to specify the "work" library. Gen: entity rate_gen to Gen: entity work.rate_gen and ISIM simply worked. I don't know if this (requirement to specify the "work" library) is an ISIM defect, but I think it's a lesser problem than the rest of ISE, including XST, has with VHDL libraries - ignoring them altogether. Which means you can use components with the same name in different libraries, and the simulator (ISIM included) will pick the one you specify, but XST will simply pick whichever it feels like, ignoring your library/use/embedded configurations. (That's supposed to be fixed in ISE11. We'll see) ISIM still has problems though: (1) the wave viewer won't show a 20ns pulse on a 1ms window, though the "find next edge" button and zooming in shows they are really there... (2) Increasing the baud rate signal rate : unsigned(15 downto 0) := X"25800"; ISIM spotted the deliberate mistake and an error message flashed past, but then it ran the out of date sim anyway (which is a VERY bad design decision IMO) But fixing the mistake signal rate : unsigned(19 downto 0) := X"25800"; and simulating for 100us, all was well. Antti, did you have a different problem? - Brian
On Feb 9, 11:19=A0am, Brian Drummond <brian_drumm...@btconnect.com>
wrote:
> On Sun, 08 Feb 2009 21:32:40 +0000, Jonathan Bromley > > <jonathan.brom...@MYCOMPANY.com> wrote: > >On Sun, 8 Feb 2009 10:27:08 -0800 (PST), Antti wrote: > > >>ok, now i KNOW too > >>used Jons unmodifed code :) > > >It simulates correctly too, although of course it needs to > >be wrapped in something that provides a signal to connect > >to its unconstrained input port. =A0I haven't tried it with > >ISIM just yet, but I believe the code is strictly VHDL-93 > >compliant and therefore it is NOT MY FAULT if some simulator > >cannot handle it. =A0 > > Well ISIM certainly has trouble with a few VHDL constructs, so I was curi=
ous
> enough to try and find the problem here. I feel Xilinx sincerely want to =
make it
> work properly, so at one point I had 8 Webcases open, most of them report=
ing
> ISIM defects. Even 10.1.03 is significantly better than 10.1.01. > > Taking the heavy hints to build a wrapper entity, I let ISE do its "new s=
ource
> ... VHDL Module" thing (and what exactly is a VHDL Module?), added a dire=
ct
> entity instantiation, and stole Antti's intermediate signal from>i constr=
ained the signal connected to rate in the wrapper as:
> >signal rate : unsigned(15 downto 0) :=3D X"2580"; > > and tried synthesis... > > ERROR:HDLParsers:800 - "/home/brian/projects/play/baudrate/baudgen.vhd" L=
ine 50.
> =A0 =A0 =A0 =A0 Type of rate is incompatible with type of rate. > > That was easily fixed... > library IEEE; > use IEEE.STD_LOGIC_1164.ALL; > --use IEEE.STD_LOGIC_ARITH.ALL; > --use IEEE.STD_LOGIC_UNSIGNED.ALL; > use ieee.numeric_std.all; > ... but grrrrrrr.... > then synthesis was as expected. > > To try ISIM... > 1) "New Source - VHDL Testbench" and associate the DUT with the wrapper e=
ntity.
> 2) I made three trivial changes to the testbench ... > (a) comment out "USE ieee.std_logic_unsigned.all;" > (oddly enough, numeric_std was already there!) > (b) set the clock frequency > =A0 =A0 constant clock_period : time :=3D 20ns; -- was 1 us > (c) change the startup delay > =A0 =A0 wait for 100ns; -- was 100ms > This would have made more difference if I had remembered to drive the Res=
et
> signal, but never mind... > > ISIM reported > ERROR:HDLCompiler:69 - "baudgen.vhd" Line 47. rate_gen is not declared > > at which point I changed the instantiation of Jonathan's entity in my wra=
pper,
> to specify the "work" library. > Gen: entity rate_gen > to > Gen: entity work.rate_gen > > and ISIM simply worked. > > I don't know if this (requirement to specify the "work" library) is an IS=
IM
> defect, but I think it's a lesser problem than the rest of ISE, including=
XST,
> has with VHDL libraries - ignoring them altogether. >
I believe that the requirement to use the "work" prefix is a consequence of the name visibility rules in VHDL. The reason you can usually get away with a non-prefixed simple component instance (as opposed to an entity) is that you've already made the name visible by "use.some_component_pkg.all". (UNISIM, VSIM, etc), or else explicitly defined your component with the rest of your declarations. A "use work.all" should achieve the same end.
> Which means you can use components with the same name in different librar=
ies,
> and the simulator (ISIM included) will pick the one you specify, but XST =
will
> simply pick whichever it feels like, ignoring your library/use/embedded > configurations. (That's supposed to be fixed in ISE11. We'll see) > > ISIM still has problems though: > (1) the wave viewer won't show a 20ns pulse on a 1ms window, though the "=
find
> next edge" button and zooming in shows they are really there... > > (2) Increasing the baud rate > signal rate : unsigned(15 downto 0) :=3D X"25800"; > ISIM spotted the deliberate mistake and an error message flashed past, > but then it ran the out of date sim anyway (which is a VERY bad design de=
cision
> IMO) > > But fixing the mistake > signal rate : unsigned(19 downto 0) :=3D X"25800"; > and simulating for 100us, all was well. > > Antti, did you have a different problem? > > - Brian