FPGARelated.com
Forums

Aligned PLL clocks in RTL simulation

Started by Jonathan Bromley November 17, 2008
Every cloud has a silver lining, but it seems 
every rose has its thorns too.

PLLs/DCMs/DLLs (or whatever your favourite FPGA
happens to offer) provide a wonderful way to create
multiplied-up clocks within the device.  What's more,
you can line up the active clock edges so closely
that you can treat the x1 and xN clock domains as
if they were one single clock domain; hold times
can be avoided when crossing the boundary in either
direction.

Until recently I've always avoided taking advantage
of this, and have treated the x1 and xN clock domains
as if they were asynchronous, using FIFOs or whatever
to convey things across the boundary.  But in a recent
client engagement I was faced with a design in which 
a x2 and x4 clock, from the same PLL, were used in 
a completely sensible way as if they were in the same
clock domain as the original x1 clock.  The TimeQuest
timing analyzer (for it was Brand A that was in use
on this occasion) was quite happy to deal with these
crossings, giving clear-headed and (as it turned out)
accurate reports of what was going on.  There is no
doubt that this is cool.

However, it's not so cool in RTL simulation.  The
PLL simulation models, not too surprisingly, 
introduce some delta delays between the
nominally coincident clock edges.  Consequently
I get everything working when going in one direction
(from fast clock to slow clock, as it turns out)
but I get shoot-through behaviour, the RTL equivalent
of a hold time violation, when crossing from slow to
fast clock; data is arriving one or more delta cycles
*before* the clock.

We've easily enough got around this for the present
design, but I'd love to know what all you seasoned
PLL/DCM users out there do about it.  Do you 
introduce small non-zero time delays in all the
signals crossing the clock domains, so that it all
works in simulation?  Do you treat the various
clock domains as if they were asynchronous, thereby
losing one of the nicest benefits of the PLLs?  Or
do you simply accept that it's necessary to do timing
simulation in order to see what will really happen?

This is partly a plague of VHDL RTL sim (hence the 
posting to c.l.vhdl as well); in Verilog you can
model clock gating and PLL-ish behaviour with "less"
zero delay than the nonblocking assignments to your 
flip-flops, by taking care to use blocking assignment 
in all your clock paths.  I have not yet tried the 
Verilog simulation models for the PLLs to see whether 
that makes any difference.

One further whinge: I haven't tried this in Brand X
recently, but the Altera PLL models are spectacularly
inefficient for RTL simulation.  In our modest-size
project - think SDRAM controller, a few FIFOs occupying
most of the blockRAM, and a fairly small bunch of
additional logic - the two PLLs are responsible for
at least 90% of the simulation time - OUCH.  I swapped-in
much simpler, but perfectly adequate in-house models and
got x10 simulation speedup.

Opinions/rants/insults welcomed.  Thanks in advance.
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.

In comp.arch.fpga Jonathan Bromley <jonathan.bromley@mycompany.com> wrote:
...

> Opinions/rants/insults welcomed. Thanks in advance.
I have a similar problem: 20 MHz "clock_in", internal used multiplied by five, used als "clk" and also used doubled as "clkx2" The clock_in is not used. I use this: `ifdef __ICARUS__ reg clkx2 = 0; reg clk = 0; always @(posedge clk_in) {clk, clkx2} <= {clk, clkx2} + {2{clk_in}}; assign alu_ctl_bits[`CMD_RST] = 1'b0; `else wire clk, clkx2; clk100 dcm0 ( .CLKIN_IN(clk_in), .RST_IN(alu_ctl_cmd[`CMD_RST]), .CLKFX_OUT(clk80), .CLKIN_IBUFG_OUT() ); DCM dcmac ( .CLKIN(clk80), .CLKFB(clkx2), .RST(alu_ctl_cmd[`CMD_RST]), .CLK0(clkacdcm), .CLK2X(clkacx2dcm), .LOCKED(alu_ctl_bits[`CMD_RST])); BUFG clkbuf(.I(clkacdcm),.O(clk)); BUFG clkx2buf(.I(clkacx2dcm),.O(clkx2)); `endif // !`ifdef __ICARUS__ For simulation I now use clk_in == clk -- Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt --------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------
Jonathan Bromley wrote:

> We've easily enough got around this for the present > design, but I'd love to know what all you seasoned > PLL/DCM users out there do about it. Do you > introduce small non-zero time delays in all the > signals crossing the clock domains, so that it all > works in simulation? Do you treat the various > clock domains as if they were asynchronous, thereby > losing one of the nicest benefits of the PLLs? Or > do you simply accept that it's necessary to do timing > simulation in order to see what will really happen?
Naturally, I would prefer to fix up the design to use a single clock or another "known good" synchronization scheme. If I were forced to use both clocks, and to trust that the vendor got the analog portions of the PLL right, I would write a simplified rtl model that just trusted the vendor specs. I don't think a gate sim would make me feel better. Maybe a SPICE sim would ;) This is analogous to the case of a two flop bit synchronizer. I might simplify a model that gave me 'U' outputs for setup violations because I 'believe" the synchronizer will work well enough.
> ... I swapped-in > much simpler, but perfectly adequate in-house models and > got x10 simulation speedup.
Sounds reasonable to me. -- Mike Treseler
Jonathan Bromley wrote:

> I swapped-in > much simpler, but perfectly adequate in-house models and > got x10 simulation speedup.
Ditto! Regards, -- Mark McDougall, Engineer Virtual Logic Pty Ltd, <http://www.vl.com.au> 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266
Jonathan Bromley <jonathan.bromley@MYCOMPANY.com> wrote in 
news:5ra3i41ksqpt0r432qv6tulvijk28s7qta@4ax.com:

> Every cloud has a silver lining, but it seems > every rose has its thorns too. > > PLLs/DCMs/DLLs (or whatever your favourite FPGA > happens to offer) provide a wonderful way to create > multiplied-up clocks within the device. What's more, > you can line up the active clock edges so closely > that you can treat the x1 and xN clock domains as > if they were one single clock domain; hold times > can be avoided when crossing the boundary in either > direction. > > Until recently I've always avoided taking advantage > of this, and have treated the x1 and xN clock domains > as if they were asynchronous, using FIFOs or whatever > to convey things across the boundary. But in a recent > client engagement I was faced with a design in which > a x2 and x4 clock, from the same PLL, were used in > a completely sensible way as if they were in the same > clock domain as the original x1 clock. The TimeQuest > timing analyzer (for it was Brand A that was in use > on this occasion) was quite happy to deal with these > crossings, giving clear-headed and (as it turned out) > accurate reports of what was going on. There is no > doubt that this is cool. > > However, it's not so cool in RTL simulation. The > PLL simulation models, not too surprisingly, > introduce some delta delays between the > nominally coincident clock edges. Consequently > I get everything working when going in one direction > (from fast clock to slow clock, as it turns out) > but I get shoot-through behaviour, the RTL equivalent > of a hold time violation, when crossing from slow to > fast clock; data is arriving one or more delta cycles > *before* the clock. > > We've easily enough got around this for the present > design, but I'd love to know what all you seasoned > PLL/DCM users out there do about it. Do you > introduce small non-zero time delays in all the > signals crossing the clock domains, so that it all > works in simulation? Do you treat the various > clock domains as if they were asynchronous, thereby > losing one of the nicest benefits of the PLLs? Or > do you simply accept that it's necessary to do timing > simulation in order to see what will really happen? > > This is partly a plague of VHDL RTL sim (hence the > posting to c.l.vhdl as well); in Verilog you can > model clock gating and PLL-ish behaviour with "less" > zero delay than the nonblocking assignments to your > flip-flops, by taking care to use blocking assignment > in all your clock paths. I have not yet tried the > Verilog simulation models for the PLLs to see whether > that makes any difference. > > One further whinge: I haven't tried this in Brand X > recently, but the Altera PLL models are spectacularly > inefficient for RTL simulation. In our modest-size > project - think SDRAM controller, a few FIFOs occupying > most of the blockRAM, and a fairly small bunch of > additional logic - the two PLLs are responsible for > at least 90% of the simulation time - OUCH. I swapped-in > much simpler, but perfectly adequate in-house models and > got x10 simulation speedup. > > Opinions/rants/insults welcomed. Thanks in advance.
I use a behavioural clock generator that has 0 skew outputs, specifically to avoid many of the problems you observe with vendors' PLLs. Yet another problem: Some PLL models can't accept jitter. I recently had an Altera PLL tell me that it was unlocking because my input clock was changing frequency. My input clock had a stable frequency, but with a jitter equal to the timing resolution of the simulator (which is necessary to simulate clocks that have a period that isn't integer multiple of the resolution, e.g. 155.52MHz with a 1ns resolution). Regards, Allan
On Nov 17, 3:38=A0pm, Mike Treseler <mtrese...@gmail.com> wrote:
> Jonathan Bromley wrote: > > We've easily enough got around this for the present > > design, but I'd love to know what all you seasoned > > PLL/DCM users out there do about it. =A0Do you > > introduce small non-zero time delays in all the > > signals crossing the clock domains, so that it all > > works in simulation? =A0Do you treat the various > > clock domains as if they were asynchronous, thereby > > losing one of the nicest benefits of the PLLs? =A0Or > > do you simply accept that it's necessary to do timing > > simulation in order to see what will really happen? > > Naturally, I would prefer to fix up the design > to use a single clock or another "known good" > synchronization scheme. > > If I were forced to use both clocks, > and to trust that the vendor got the > analog portions of the PLL right, I would > write a simplified rtl model that > just trusted the vendor specs. > I don't think a gate sim would make me feel better. > Maybe a SPICE sim would ;) > > This is analogous to the case of a > two flop bit synchronizer. I might simplify > a model that gave me 'U' outputs for > setup violations because I 'believe" > the synchronizer will work well enough. > > > ... I swapped-in > > much simpler, but perfectly adequate in-house models and > > got x10 simulation speedup. > > Sounds reasonable to me. > > =A0 =A0-- Mike Treseler
Not quite where Jonathan was headed with this, but: Applying "standard" synchronization techniques to not-quite- asynchronous interfaces can and has caused problems. With truly asynchronous interfaces, the probability that an input will fall within the narrow region that causes metastability lasting long enough to be a problem (with two flop synchronizers) is extremely rare. However, if the two clock domains are related, such an event can happen much more often (or never at all). If they do happen (i.e. the stars align...) they will happen much more frequently (i.e. the stars will stay aligned). If at all possible I would take steps to ensure that either the clocks are related and a fully synchronous interface is employed, or that they are not related and asynchronous interface techniques are employed. Failing that, a three stage synchronizer should be considered. I have solved the simulation problem in the past by running the main clock through the same module where the DCM is, and providing a 1:1 clock output that is delayed (RTL) for the same number of delta cycles as the DCM delays its output. That delayed 1:1 output is used to drive the rest of the design. This is not always easy, especially when the DCM would otherwise best be buried down at an appropriate level of hierarchy along with it's associated functionality. Andy
On Tue, 18 Nov 2008 06:36:22 -0800 (PST), Andy wrote:

>Applying "standard" synchronization techniques to not-quite- >asynchronous interfaces can and has caused problems. With truly >asynchronous interfaces, the probability that an input will fall >within the narrow region that causes metastability lasting long enough >to be a problem (with two flop synchronizers) is extremely rare. >However, if the two clock domains are related, such an event can >happen much more often (or never at all). If they do happen (i.e. the >stars align...) they will happen much more frequently (i.e. the stars >will stay aligned).
Yes. Worse still, you can easily lose track of which source clock gave rise to the datum on a given destination clock, because the quasi-static phase relationship between the two clocks is unknown and highly variable from one instance of the design to another. I suffered this on the same recent project: part of the design was, for very good reasons, clocked by exactly the main system clock that had been through a chain of external buffers (thereby allowing the design to track temperature/voltage/process variations in the behaviour of other signals that went through similar external buffers). I had the devil of a time trying to persuade the designers that we needed to know the window within which the delayed clock would fall, so that we could decide which edge of it belonged with which edge of the master clock. Of course, no-one had thought to provide a synchronous "data valid" signal that could have been used to track this.
>I have solved the simulation problem in the past by running the main >clock through the same module where the DCM is, and providing a 1:1 >clock output that is delayed (RTL) for the same number of delta cycles >as the DCM delays its output. That delayed 1:1 output is used to drive >the rest of the design. This is not always easy, especially when the >DCM would otherwise best be buried down at an appropriate level of >hierarchy along with it's associated functionality.
Perfect summary of the issues I was hoping to raise. Thanks. -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
"Jonathan Bromley" <jonathan.bromley@MYCOMPANY.com> wrote in message 
news:jgn5i4hgvo65a0stlpsfsmlo6de0q1lugs@4ax.com...
> >>I have solved the simulation problem in the past by running the main >>clock through the same module where the DCM is, and providing a 1:1 >>clock output that is delayed (RTL) for the same number of delta cycles >>as the DCM delays its output. That delayed 1:1 output is used to drive >>the rest of the design. This is not always easy, especially when the >>DCM would otherwise best be buried down at an appropriate level of >>hierarchy along with it's associated functionality. > > Perfect summary of the issues I was hoping to raise. Thanks. > -- > Jonathan Bromley, Consultant >
It also seems that if the design only uses the outputs from the DCM only, i.e. CLK0, CLKDV, CLK2X, which is the way they are 'meant' to be used, then they are already aligned. Problems arise when folks subsequently add stuff to their VHDL like:- my_clock <= his_clock; This assignment is optimised away in real life, but in the simulation, my_clock is now a delta later than his_clock, and maybe no longer aligns with his_clock_2X. HTH., Syms.
>PLLs/DCMs/DLLs (or whatever your favourite FPGA >happens to offer) provide a wonderful way to create >multiplied-up clocks within the device. What's more, >you can line up the active clock edges so closely >that you can treat the x1 and xN clock domains as >if they were one single clock domain; hold times >can be avoided when crossing the boundary in either >direction.
Do the vendors actually support that mode? It seems reasonable, but I remember some discussion from a year or three ago where somebody eventually tracked a bug down to it not quite working. Newer silicon might take that into account.. The basic idea is that the Xilinx tools don't bother checking hold times. All their FFs have "0 hold time". What that really means is that the min clock-to-out time plus min prop delays are enough to cover the hold time and the clock skew. The catch is that you can get additional skew if you are using two clocks even though they should be aligned. -- These are my opinions, not necessarily my employer's. I hate spam.
Jonathan
> Every cloud has a silver lining, but it seems > every rose has its thorns too. > > PLLs/DCMs/DLLs . . . > > We've easily enough got around this for the present > design, but I'd love to know what all you seasoned > PLL/DCM users out there do about it. Do you > introduce small non-zero time delays in all the > signals crossing the clock domains, so that it all > works in simulation? Do you treat the various > clock domains as if they were asynchronous, thereby > losing one of the nicest benefits of the PLLs? Or > do you simply accept that it's necessary to do timing > simulation in order to see what will really happen?
Haven't had to do this, so I will introduce a fourth question, if all clocks are truely aligned have you tried removing delta cycle differences via adding a small non-zero time delay (less than tperiod_Clk/2) to the clock outputs? Clk_X1_DS <= Clk_X1 after 1 ns ; Clk_X2_DS <= Clk_X2 after 1 ns ; Clk_X4_DS <= Clk_X4 after 1 ns ; Since synthesis tools ignore after (or at least are supposed to), this should be ok to add to the RTL code. Cheers, Jim -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jim Lewis SynthWorks VHDL Training http://www.synthworks.com A bird in the hand may be worth two in the bush, but it sure makes it hard to type.