Reply by Jonathan Bromley February 11, 20092009-02-11
On Tue, 10 Feb 2009 17:13:07 GMT, Nico Coesel wrote:


>>...I wanted to deal with the case where M is >>a variable and in that case we end up with two >>adders which have to be cascaded somehow. > >You don't have to. Just say A=N-M and add A to the pulse accumulator. >A can be calculated in a seperate process.
Indeed it can, but in what way is that different from "cascaded somehow"? As I and others have already pointed out, you can pipeline the N-M addition at zero cost in an FPGA; but you still need two adders. -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
Reply by Nico Coesel February 10, 20092009-02-10
Jonathan Bromley <jonathan.bromley@MYCOMPANY.com> wrote:

>On Sun, 08 Feb 2009 23:23:00 GMT, Nico Coesel wrote: > > >>What if you simply add N-M to the accumulator? >> >> on every clock pulse... >> if (acc < 0) then >> acc := acc + (N -M); >> output_pulse <= '1'; >> else >> output_pulse <= '0'; >> acc := acc - M; >> end if; > >That is very good if N and M are both constants, >but I wanted to deal with the case where M is >a variable and in that case we end up with two >adders which have to be cascaded somehow.
You don't have to. Just say A=N-M and add A to the pulse accumulator. A can be calculated in a seperate process. -- Failure does not prove something is impossible, failure simply indicates you are not using the right tools... "If it doesn't fit, use a bigger hammer!" --------------------------------------------------------------
Reply by Gerhard Hoffmann February 10, 20092009-02-10
On Mon, 9 Feb 2009 16:32:44 -0800 (PST), rickman <gnuarm@gmail.com> wrote:

>> OK, works for 4-input LUTs. > >Did you forget 1 + 1 + 1 + carryin = 100 ?
Ouch, incorrectly remembered 3 to 2 counter principle. Should not write about such stuff late at night. Gerhard
Reply by Jonathan Bromley February 10, 20092009-02-10
On Sun, 08 Feb 2009 23:23:00 GMT, Nico Coesel wrote:


>What if you simply add N-M to the accumulator? > > on every clock pulse... > if (acc < 0) then > acc := acc + (N -M); > output_pulse <= '1'; > else > output_pulse <= '0'; > acc := acc - M; > end if;
That is very good if N and M are both constants, but I wanted to deal with the case where M is a variable and in that case we end up with two adders which have to be cascaded somehow. I revisited the problem for the case where N and M are both constants, and noted that you can easily precalculate the greatest common divisor of N, M and thereby reduce the fraction M/N to its lowest terms. This helps to minimize the design without unnecessary human effort, which rates pretty highly on my lazy man's list of desiderata. Here's the code... -- put this in a package, or in the architecture entity fixed_rate_gen is generic (divisor, multiplier: positive); port (clock: in std_logic; pulse: out std_logic); end; architecture rtl of fixed_rate_gen is function euclid_gcd(divisor, multiplier: positive) return positive is variable r0, r1, r: natural; begin assert multiplier <= divisor report "Multiplier is greater than divisor" severity failure; r0 := multiplier; r1 := divisor; while r0 /= 0 loop r := r1 rem r0; r1 := r0; r0 := r; end loop; return r1; end; constant gcd: positive := euclid_gcd(divisor, multiplier); constant m: positive := multiplier/ gcd; constant wrap: positive := (divisor / gcd) - m; begin process (clock) variable acc: integer range -m to wrap-1 := 0; begin if rising_edge(clock) then if acc < 0 then acc := acc + wrap; pulse <= '1'; else acc := acc - m; pulse <= '0'; end if; end if; end process; end; I love the Euclid GCD algorithm - so neat and simple, so non-obvious (to me at least). -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
Reply by Jonathan Bromley February 10, 20092009-02-10
On Mon, 9 Feb 2009 14:13:23 -0800 (PST), Gabor wrote:

>On Feb 8, 12:02&#4294967295;pm, Jonathan Bromley wrote: >> The question - repeated after the explanation - >> is: here's what I think is a nifty trick; has >> anyone seen it, or been aware of it, before? >> I can't believe it's really new.
[...]
>Did you see this thread on comp.lang.verilog?
http://groups.google.com/group/comp.lang.verilog/browse_frm/thread/7cedbaf9bdd6f1ad?hl=en# No, I don't recall reading it... looks interesting. Thanks for the pointer. -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
Reply by rickman February 9, 20092009-02-09
On Feb 9, 5:57=A0am, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> On Mon, 09 Feb 2009 10:14:09 +0000, Jonathan Bromley wrote: > >But there's another idea coming... > > which is to time-division mux the two additions. > This degrades the jitter to 2 master clock periods, > but gives what I believe to be the most compact > and fastest possible implementation for a phase > accumulator whose modulus is not a power of 2. > I removed the reset because it's fairly useless. > > As with the earlier implementation, this one > can only provide output rates up to Fc/2. > > =A0 library ieee; > =A0 use ieee.std_logic_1164.all; > =A0 use ieee.numeric_std.all; > > =A0 entity rate_gen is > =A0 =A0 generic ( ref_Hz: positive :=3D 50_000_000 ); > =A0 =A0 port > =A0 =A0 =A0 ( clock : in =A0std_logic > =A0 =A0 =A0 ; rate =A0: in =A0unsigned > =A0 =A0 =A0 ; pulse : out std_logic > =A0 =A0 =A0 ); > =A0 end; > > =A0 architecture RTL_2ph of rate_gen is > =A0 begin > =A0 =A0 process (clock) > =A0 =A0 =A0 -- Halve the modulus to account for 2-phase operation > =A0 =A0 =A0 constant modulus: integer :=3D ref_Hz/2; > =A0 =A0 =A0 -- This flag controls the adder multiplexing > =A0 =A0 =A0 variable phase: boolean; > =A0 =A0 =A0 variable count: integer range -2**rate'length to modulus-1 :=
=3D 0;
> =A0 =A0 begin > =A0 =A0 =A0 if rising_edge(clock) then > =A0 =A0 =A0 =A0 pulse <=3D '0'; > =A0 =A0 =A0 =A0 if phase then > =A0 =A0 =A0 =A0 =A0 count :=3D count - to_integer(rate); > =A0 =A0 =A0 =A0 elsif count < 0 then > =A0 =A0 =A0 =A0 =A0 count :=3D count + modulus; > =A0 =A0 =A0 =A0 =A0 pulse <=3D '1'; > =A0 =A0 =A0 =A0 end if; > =A0 =A0 =A0 =A0 phase :=3D not phase; > =A0 =A0 =A0 end if; > =A0 =A0 end process; > > =A0 end; > > Thanks for all the comments.
I don't get how this is smaller or faster than any of the other approaches. Rick
Reply by rickman February 9, 20092009-02-09
On Feb 9, 5:14=A0am, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> On Sun, 8 Feb 2009 22:42:04 -0800 (PST), rickman wrote: > >This is an interesting problem, am I understanding it correctly? > > Yes; more correctly than I did at first, I think. > > Various people have correctly pointed out that the N-M > calculation does not need to be on a timing arc, but it's > tough to convince the tools of that.
Tough, but not impossible. There is a way to tell the tools than any path through a given point has a specified timing. You need to apply this to the net which is the output of the adder of N-M.
> Other people have correctly pointed out that my trick > to convert 2 adders and a 2-in MUX into one adder and > a 3-in MUX does not save any area. =A0I did consistently > find, however, that it gave significantly better Fmax; > I'm not 100% sure I know why. =A0If we have 6-input LUTs > then my trick would be a very big win.
I agree that the timing should be close between the two examples. But adders have to be arranged in a column while the bits of a mux can be placed anywhere close and will have good timing. I expect this is the sort of design that can be helped significantly by floorplanning.
> Finally, someone pointed out that the N-M calculation > could be pipelined. =A0In FPGAs, with one FF per LUT > whether you use it or not, that turns out better than > any other form I've tried. =A0
I'm not sure what is meant by that, but certainly it will not hurt to add FFs to the output of N-M since they are virtually static for this application. By adding FFs here, you will in essence be cutting the timing path allowing the timing analyzer to see only the portions of the design that need to be fast. In fact, you can eliminate the adder altogether by having the programmed registers hold N and N-M instead of N and M. That *will* increase speed as well as reducing footprint.
> Better still, if N and M > are both constants then the tools correctly identify > that the (N-M) pipeline register is constant, and > optimize it away. =A0So my original question, and my > original "trick", become irrelevant (except in > Spartan-6, maybe???!!!) and my "best effort" is:
I don't know what higher level muxes they have in the 6 series of parts. A 6 input LUT is still not enough to support a 3 input mux and a full adder. The LUT needs 5 inputs for the mux plus one for the accumulator plus one more for the carry input to each bit.
> library ieee; > use ieee.std_logic_1164.all; > use ieee.numeric_std.all; > > entity rate_gen is > =A0 generic ( ref_Hz: positive :=3D 50_000_000 ); > =A0 port > =A0 =A0 ( clock : in =A0std_logic > =A0 =A0 ; reset : in =A0std_logic > =A0 =A0 ; rate =A0: in =A0unsigned > =A0 =A0 ; pulse : out std_logic > =A0 =A0 ); > end; > > architecture RTL of rate_gen is > > begin > =A0 process (clock) > =A0 =A0 variable count: integer range -2**rate'length to ref_Hz-1 :=3D 0; > =A0 =A0 variable wrap: natural range 0 to ref_Hz :=3D ref_Hz; > =A0 begin > =A0 =A0 if rising_edge(clock) then > =A0 =A0 =A0 pulse <=3D '0'; > =A0 =A0 =A0 if reset =3D '1' then > =A0 =A0 =A0 =A0 count :=3D 0; > =A0 =A0 =A0 elsif count < 0 then > =A0 =A0 =A0 =A0 pulse <=3D '1'; > =A0 =A0 =A0 =A0 count :=3D count + wrap; > =A0 =A0 =A0 else > =A0 =A0 =A0 =A0 count :=3D count - to_integer(rate); > =A0 =A0 =A0 end if; > =A0 =A0 =A0 wrap :=3D ref_Hz - to_integer(rate); > =A0 =A0 end if; > =A0 end process; > end; > > The synchronous reset adds a tiny amount of delay (routing???) > and is probably unnecessary. > > But there's another idea coming...
Here is your code with two setup registers. library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; entity rate_gen is generic ( ref_Hz: positive :=3D 50_000_000 ); port ( clock : in std_logic ; reset : in std_logic ; rate : in unsigned ; n_m : in unsigned ; pulse : out std_logic ); end; architecture RTL of rate_gen is begin process (clock) variable count: integer range -2**rate'length to ref_Hz-1 :=3D 0; begin if rising_edge(clock) then pulse <=3D '0'; if reset =3D '1' then count :=3D 0; elsif count < 0 then pulse <=3D '1'; count :=3D count + n_m; else count :=3D count - to_integer(rate); end if; end if; end process; end; Rick
Reply by rickman February 9, 20092009-02-09
On Feb 9, 3:58=A0pm, Gerhard Hoffmann <dk...@hoffmann-hochfrequenz.de>
wrote:
> On Sun, 08 Feb 2009 21:54:45 +0000, Jonathan Bromley <jonathan.brom...@MY=
COMPANY.com> wrote:
> >The last line requires TWO adders, in addition to the > >multiplexer created by the IF. =A0This causes a significant > >performance hit. =A0That's what I was trying to fix. =A0I did > >it by saying... > > You can add 3 numbers in the same time as 2 because the maximum > carry generated at any bit position is still '1'. > I.e. '1' + '1' + '1' is still '11'. > OK, works for 4-input LUTs.
Did you forget 1 + 1 + 1 + carryin =3D 100 ?
> 4 numbers will make the carry chain more complicated. > I have not tried if Virtex carry chains can take advantage of this. > If yes, the mux should be possible in the same block.
3 numbers make the carry chain more complex. I'm sure that if it were practical or even possible to add three numbers using a single 4 LUT, we would already know about it. Rick
Reply by Brian Drummond February 9, 20092009-02-09
On Mon, 9 Feb 2009 10:18:27 -0800 (PST), Antti <Antti.Lukats@googlemail.com>
wrote:

>On Feb 9, 8:07&#4294967295;pm, Brian Drummond <brian_drumm...@btconnect.com> >wrote:
>> >So i did MINIMAL fix to get pass XST >> >that was not enough to pass with ISIM >> >> Interesting because that's exactly the conditions that gave me the XST above ... >> type mismatch between &#4294967295;IEEE.STD_LOGIC_UNSIGNED (in the wrapper) and numeric_std >> (in Jonathan's code). >> >> So I don't see how you got it through XST...
>Brian >plese LOOK below, my complete wrapper. >AS is it passes XST until bit file all ok.
aha! The full code reveals the secret... cunningly exploiting a bug in XST to make it handle incorrect code! Because this uses a component declaration, XST can't error out here because it has to assume you will supply a matching entity at elaboration; if it was to check the available entities it could at best conclude you hadn't written it yet, which is completely legal... if you had used an entity as I did (reducing the work even further) it would have found the error (and did). So it's at elaboration the error occurred; XST couldn't find a matching entity, never mind, it used the first entity it could find with the right name and roughly the right number of ports! (which, being the masochist I am, I reported for ISE 7.1 and 10.1, and I'm told it should be fixed in 11)
>but yikes in ISIM
Good for ISIM, this time! - Brian
Reply by Gabor February 9, 20092009-02-09
On Feb 8, 12:02=A0pm, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> hi comp.arch.fpga, > (accidentally posted to comp.lang.vhdl > a few moments ago- sorry) > > The question - repeated after the explanation - > is: here's what I think is a nifty trick; has > anyone seen it, or been aware of it, before? > I can't believe it's really new. > > I have been messing around with baud rate generators > and suchlike - creating a pulse that's active for > one clock period at some required repetition rate - > and wanted to try a phase accumulator technique > instead of a simple divider. =A0That makes it far > easier to specify the frequency - it's simply the > phase-delta input value - and easily allows for > non-integral divide ratios, at the cost of one > master clock period of jitter. > > The phase-accumulator produces pulses with a > repetition rate of > =A0 Fc * M / N > where Fc is the master clock, M is the phase delta > and N is the counter's modulus. =A0However, to get > the huge convenience of specifying M as the required > frequency, I must make N be equal to the frequency > of Fc, and this is unlikely to be an exact power of 2. > So the phase accumulator works like this: > > =A0 on every clock pulse... > =A0 =A0 if (acc < 0) then > =A0 =A0 =A0 add :=3D acc + N; > =A0 =A0 =A0 output_pulse <=3D '1'; > =A0 =A0 else > =A0 =A0 =A0 output_pulse <=3D '0'; > =A0 =A0 end if; > =A0 =A0 acc :=3D acc - M; =A0-- unconditionally > > This is fine, but it means that on the "wrap-around" > clock cycle I must add either N-M to the accumulator; > if either M or N are variable, that costs me another > adder. > > Today I came up with an intriguing (to me) alternative: > on the wrap-around cycle, add N to the accumulator; > on the immediately subsequent cycle, add (-2M); on > all other cycles, add (-M). =A0This is of course rather > easy to do since 2M is just a left shift. =A0A few > trial synthesis runs convinced me that it will give > measurably better performance than the two-adder > version. =A0VHDL code is appended for anyone who wants > to play. > > My question is: has this trick been published anywhere? > Or is it something that "those skilled in the art" > already know about? =A0I haven't seen it before, but that > simply means I probably haven't looked hard enough. > > Thanks! > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ~~ rate generator using novel wrap-around technique > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > library ieee; > use ieee.std_logic_1164.all; > use ieee.numeric_std.all; > > entity rate_gen is > =A0 -- Specify the master clock frequency as a generic. > =A0 generic ( ref_Hz: positive :=3D 50_000_000 ); > =A0 port > =A0 =A0 ( clock : in =A0std_logic > =A0 =A0 ; reset : in =A0std_logic =A0-- Synchronous reset. > =A0 =A0 ; rate =A0: in =A0unsigned =A0 -- Desired output frequency > =A0 =A0 ; pulse : out std_logic =A0-- The output pulse train > =A0 =A0 ); > end; > > architecture RTL of rate_gen is > begin > =A0 process (clock) > =A0 =A0 -- variable "count" is the accumulator > =A0 =A0 variable count: integer range -2**rate'length to ref_Hz-1 :=3D 0; > =A0 =A0 -- variable "overflow" is the output pulse and wraparound marker > =A0 =A0 variable overflow: std_logic :=3D '0'; > =A0 begin > =A0 =A0 if rising_edge(clock) then > =A0 =A0 =A0 if reset =3D '1' then > =A0 =A0 =A0 =A0 count :=3D 0; > =A0 =A0 =A0 =A0 overflow :=3D '0'; > =A0 =A0 =A0 elsif count < 0 then > =A0 =A0 =A0 =A0 overflow :=3D '1'; > =A0 =A0 =A0 =A0 count :=3D count + ref_Hz; > =A0 =A0 =A0 elsif overflow =3D '1' then > =A0 =A0 =A0 =A0 overflow :=3D '0'; > =A0 =A0 =A0 =A0 count :=3D count - (to_integer(rate) * 2); > =A0 =A0 =A0 else > =A0 =A0 =A0 =A0 overflow :=3D '0'; > =A0 =A0 =A0 =A0 count :=3D count - to_integer(rate); > =A0 =A0 =A0 end if; > =A0 =A0 =A0 pulse <=3D overflow; > =A0 =A0 end if; > =A0 end process; > end; > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > -- > Jonathan Bromley, Consultant > > DOULOS - Developing Design Know-how > VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services > > Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK > jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com > > The contents of this message may contain personal views which > are not the views of Doulos Ltd., unless specifically stated. > -- > Jonathan Bromley, Consultant > > DOULOS - Developing Design Know-how > VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services > > Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK > jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com > > The contents of this message may contain personal views which > are not the views of Doulos Ltd., unless specifically stated.
Did you see this thread on comp.lang.verilog? http://groups.google.com/group/comp.lang.verilog/browse_frm/thread/7cedbaf9= bdd6f1ad?hl=3Den#