Every cloud has a silver lining, but it seems every rose has its thorns too. PLLs/DCMs/DLLs (or whatever your favourite FPGA happens to offer) provide a wonderful way to create multiplied-up clocks within the device. What's more, you can line up the active clock edges so closely that you can treat the x1 and xN clock domains as if they were one single clock domain; hold times can be avoided when crossing the boundary in either direction. Until recently I've always avoided taking advantage of this, and have treated the x1 and xN clock domains as if they were asynchronous, using FIFOs or whatever to convey things across the boundary. But in a recent client engagement I was faced with a design in which a x2 and x4 clock, from the same PLL, were used in a completely sensible way as if they were in the same clock domain as the original x1 clock. The TimeQuest timing analyzer (for it was Brand A that was in use on this occasion) was quite happy to deal with these crossings, giving clear-headed and (as it turned out) accurate reports of what was going on. There is no doubt that this is cool. However, it's not so cool in RTL simulation. The PLL simulation models, not too surprisingly, introduce some delta delays between the nominally coincident clock edges. Consequently I get everything working when going in one direction (from fast clock to slow clock, as it turns out) but I get shoot-through behaviour, the RTL equivalent of a hold time violation, when crossing from slow to fast clock; data is arriving one or more delta cycles *before* the clock. We've easily enough got around this for the present design, but I'd love to know what all you seasoned PLL/DCM users out there do about it. Do you introduce small non-zero time delays in all the signals crossing the clock domains, so that it all works in simulation? Do you treat the various clock domains as if they were asynchronous, thereby losing one of the nicest benefits of the PLLs? Or do you simply accept that it's necessary to do timing simulation in order to see what will really happen? This is partly a plague of VHDL RTL sim (hence the posting to c.l.vhdl as well); in Verilog you can model clock gating and PLL-ish behaviour with "less" zero delay than the nonblocking assignments to your flip-flops, by taking care to use blocking assignment in all your clock paths. I have not yet tried the Verilog simulation models for the PLLs to see whether that makes any difference. One further whinge: I haven't tried this in Brand X recently, but the Altera PLL models are spectacularly inefficient for RTL simulation. In our modest-size project - think SDRAM controller, a few FIFOs occupying most of the blockRAM, and a fairly small bunch of additional logic - the two PLLs are responsible for at least 90% of the simulation time - OUCH. I swapped-in much simpler, but perfectly adequate in-house models and got x10 simulation speedup. Opinions/rants/insults welcomed. Thanks in advance. -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
Aligned PLL clocks in RTL simulation
Started by ●November 17, 2008
Reply by ●November 17, 20082008-11-17
In comp.arch.fpga Jonathan Bromley <jonathan.bromley@mycompany.com> wrote: ...> Opinions/rants/insults welcomed. Thanks in advance.I have a similar problem: 20 MHz "clock_in", internal used multiplied by five, used als "clk" and also used doubled as "clkx2" The clock_in is not used. I use this: `ifdef __ICARUS__ reg clkx2 = 0; reg clk = 0; always @(posedge clk_in) {clk, clkx2} <= {clk, clkx2} + {2{clk_in}}; assign alu_ctl_bits[`CMD_RST] = 1'b0; `else wire clk, clkx2; clk100 dcm0 ( .CLKIN_IN(clk_in), .RST_IN(alu_ctl_cmd[`CMD_RST]), .CLKFX_OUT(clk80), .CLKIN_IBUFG_OUT() ); DCM dcmac ( .CLKIN(clk80), .CLKFB(clkx2), .RST(alu_ctl_cmd[`CMD_RST]), .CLK0(clkacdcm), .CLK2X(clkacx2dcm), .LOCKED(alu_ctl_bits[`CMD_RST])); BUFG clkbuf(.I(clkacdcm),.O(clk)); BUFG clkx2buf(.I(clkacx2dcm),.O(clkx2)); `endif // !`ifdef __ICARUS__ For simulation I now use clk_in == clk -- Uwe Bonnes bon@elektron.ikp.physik.tu-darmstadt.de Institut fuer Kernphysik Schlossgartenstrasse 9 64289 Darmstadt --------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------
Reply by ●November 17, 20082008-11-17
Jonathan Bromley wrote:> We've easily enough got around this for the present > design, but I'd love to know what all you seasoned > PLL/DCM users out there do about it. Do you > introduce small non-zero time delays in all the > signals crossing the clock domains, so that it all > works in simulation? Do you treat the various > clock domains as if they were asynchronous, thereby > losing one of the nicest benefits of the PLLs? Or > do you simply accept that it's necessary to do timing > simulation in order to see what will really happen?Naturally, I would prefer to fix up the design to use a single clock or another "known good" synchronization scheme. If I were forced to use both clocks, and to trust that the vendor got the analog portions of the PLL right, I would write a simplified rtl model that just trusted the vendor specs. I don't think a gate sim would make me feel better. Maybe a SPICE sim would ;) This is analogous to the case of a two flop bit synchronizer. I might simplify a model that gave me 'U' outputs for setup violations because I 'believe" the synchronizer will work well enough.> ... I swapped-in > much simpler, but perfectly adequate in-house models and > got x10 simulation speedup.Sounds reasonable to me. -- Mike Treseler
Reply by ●November 17, 20082008-11-17
Jonathan Bromley wrote:> I swapped-in > much simpler, but perfectly adequate in-house models and > got x10 simulation speedup.Ditto! Regards, -- Mark McDougall, Engineer Virtual Logic Pty Ltd, <http://www.vl.com.au> 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266
Reply by ●November 18, 20082008-11-18
Jonathan Bromley <jonathan.bromley@MYCOMPANY.com> wrote in news:5ra3i41ksqpt0r432qv6tulvijk28s7qta@4ax.com:> Every cloud has a silver lining, but it seems > every rose has its thorns too. > > PLLs/DCMs/DLLs (or whatever your favourite FPGA > happens to offer) provide a wonderful way to create > multiplied-up clocks within the device. What's more, > you can line up the active clock edges so closely > that you can treat the x1 and xN clock domains as > if they were one single clock domain; hold times > can be avoided when crossing the boundary in either > direction. > > Until recently I've always avoided taking advantage > of this, and have treated the x1 and xN clock domains > as if they were asynchronous, using FIFOs or whatever > to convey things across the boundary. But in a recent > client engagement I was faced with a design in which > a x2 and x4 clock, from the same PLL, were used in > a completely sensible way as if they were in the same > clock domain as the original x1 clock. The TimeQuest > timing analyzer (for it was Brand A that was in use > on this occasion) was quite happy to deal with these > crossings, giving clear-headed and (as it turned out) > accurate reports of what was going on. There is no > doubt that this is cool. > > However, it's not so cool in RTL simulation. The > PLL simulation models, not too surprisingly, > introduce some delta delays between the > nominally coincident clock edges. Consequently > I get everything working when going in one direction > (from fast clock to slow clock, as it turns out) > but I get shoot-through behaviour, the RTL equivalent > of a hold time violation, when crossing from slow to > fast clock; data is arriving one or more delta cycles > *before* the clock. > > We've easily enough got around this for the present > design, but I'd love to know what all you seasoned > PLL/DCM users out there do about it. Do you > introduce small non-zero time delays in all the > signals crossing the clock domains, so that it all > works in simulation? Do you treat the various > clock domains as if they were asynchronous, thereby > losing one of the nicest benefits of the PLLs? Or > do you simply accept that it's necessary to do timing > simulation in order to see what will really happen? > > This is partly a plague of VHDL RTL sim (hence the > posting to c.l.vhdl as well); in Verilog you can > model clock gating and PLL-ish behaviour with "less" > zero delay than the nonblocking assignments to your > flip-flops, by taking care to use blocking assignment > in all your clock paths. I have not yet tried the > Verilog simulation models for the PLLs to see whether > that makes any difference. > > One further whinge: I haven't tried this in Brand X > recently, but the Altera PLL models are spectacularly > inefficient for RTL simulation. In our modest-size > project - think SDRAM controller, a few FIFOs occupying > most of the blockRAM, and a fairly small bunch of > additional logic - the two PLLs are responsible for > at least 90% of the simulation time - OUCH. I swapped-in > much simpler, but perfectly adequate in-house models and > got x10 simulation speedup. > > Opinions/rants/insults welcomed. Thanks in advance.I use a behavioural clock generator that has 0 skew outputs, specifically to avoid many of the problems you observe with vendors' PLLs. Yet another problem: Some PLL models can't accept jitter. I recently had an Altera PLL tell me that it was unlocking because my input clock was changing frequency. My input clock had a stable frequency, but with a jitter equal to the timing resolution of the simulator (which is necessary to simulate clocks that have a period that isn't integer multiple of the resolution, e.g. 155.52MHz with a 1ns resolution). Regards, Allan
Reply by ●November 18, 20082008-11-18
On Nov 17, 3:38=A0pm, Mike Treseler <mtrese...@gmail.com> wrote:> Jonathan Bromley wrote: > > We've easily enough got around this for the present > > design, but I'd love to know what all you seasoned > > PLL/DCM users out there do about it. =A0Do you > > introduce small non-zero time delays in all the > > signals crossing the clock domains, so that it all > > works in simulation? =A0Do you treat the various > > clock domains as if they were asynchronous, thereby > > losing one of the nicest benefits of the PLLs? =A0Or > > do you simply accept that it's necessary to do timing > > simulation in order to see what will really happen? > > Naturally, I would prefer to fix up the design > to use a single clock or another "known good" > synchronization scheme. > > If I were forced to use both clocks, > and to trust that the vendor got the > analog portions of the PLL right, I would > write a simplified rtl model that > just trusted the vendor specs. > I don't think a gate sim would make me feel better. > Maybe a SPICE sim would ;) > > This is analogous to the case of a > two flop bit synchronizer. I might simplify > a model that gave me 'U' outputs for > setup violations because I 'believe" > the synchronizer will work well enough. > > > ... I swapped-in > > much simpler, but perfectly adequate in-house models and > > got x10 simulation speedup. > > Sounds reasonable to me. > > =A0 =A0-- Mike TreselerNot quite where Jonathan was headed with this, but: Applying "standard" synchronization techniques to not-quite- asynchronous interfaces can and has caused problems. With truly asynchronous interfaces, the probability that an input will fall within the narrow region that causes metastability lasting long enough to be a problem (with two flop synchronizers) is extremely rare. However, if the two clock domains are related, such an event can happen much more often (or never at all). If they do happen (i.e. the stars align...) they will happen much more frequently (i.e. the stars will stay aligned). If at all possible I would take steps to ensure that either the clocks are related and a fully synchronous interface is employed, or that they are not related and asynchronous interface techniques are employed. Failing that, a three stage synchronizer should be considered. I have solved the simulation problem in the past by running the main clock through the same module where the DCM is, and providing a 1:1 clock output that is delayed (RTL) for the same number of delta cycles as the DCM delays its output. That delayed 1:1 output is used to drive the rest of the design. This is not always easy, especially when the DCM would otherwise best be buried down at an appropriate level of hierarchy along with it's associated functionality. Andy
Reply by ●November 18, 20082008-11-18
On Tue, 18 Nov 2008 06:36:22 -0800 (PST), Andy wrote:>Applying "standard" synchronization techniques to not-quite- >asynchronous interfaces can and has caused problems. With truly >asynchronous interfaces, the probability that an input will fall >within the narrow region that causes metastability lasting long enough >to be a problem (with two flop synchronizers) is extremely rare. >However, if the two clock domains are related, such an event can >happen much more often (or never at all). If they do happen (i.e. the >stars align...) they will happen much more frequently (i.e. the stars >will stay aligned).Yes. Worse still, you can easily lose track of which source clock gave rise to the datum on a given destination clock, because the quasi-static phase relationship between the two clocks is unknown and highly variable from one instance of the design to another. I suffered this on the same recent project: part of the design was, for very good reasons, clocked by exactly the main system clock that had been through a chain of external buffers (thereby allowing the design to track temperature/voltage/process variations in the behaviour of other signals that went through similar external buffers). I had the devil of a time trying to persuade the designers that we needed to know the window within which the delayed clock would fall, so that we could decide which edge of it belonged with which edge of the master clock. Of course, no-one had thought to provide a synchronous "data valid" signal that could have been used to track this.>I have solved the simulation problem in the past by running the main >clock through the same module where the DCM is, and providing a 1:1 >clock output that is delayed (RTL) for the same number of delta cycles >as the DCM delays its output. That delayed 1:1 output is used to drive >the rest of the design. This is not always easy, especially when the >DCM would otherwise best be buried down at an appropriate level of >hierarchy along with it's associated functionality.Perfect summary of the issues I was hoping to raise. Thanks. -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
Reply by ●November 18, 20082008-11-18
"Jonathan Bromley" <jonathan.bromley@MYCOMPANY.com> wrote in message news:jgn5i4hgvo65a0stlpsfsmlo6de0q1lugs@4ax.com...> >>I have solved the simulation problem in the past by running the main >>clock through the same module where the DCM is, and providing a 1:1 >>clock output that is delayed (RTL) for the same number of delta cycles >>as the DCM delays its output. That delayed 1:1 output is used to drive >>the rest of the design. This is not always easy, especially when the >>DCM would otherwise best be buried down at an appropriate level of >>hierarchy along with it's associated functionality. > > Perfect summary of the issues I was hoping to raise. Thanks. > -- > Jonathan Bromley, Consultant >It also seems that if the design only uses the outputs from the DCM only, i.e. CLK0, CLKDV, CLK2X, which is the way they are 'meant' to be used, then they are already aligned. Problems arise when folks subsequently add stuff to their VHDL like:- my_clock <= his_clock; This assignment is optimised away in real life, but in the simulation, my_clock is now a delta later than his_clock, and maybe no longer aligns with his_clock_2X. HTH., Syms.
Reply by ●November 18, 20082008-11-18
>PLLs/DCMs/DLLs (or whatever your favourite FPGA >happens to offer) provide a wonderful way to create >multiplied-up clocks within the device. What's more, >you can line up the active clock edges so closely >that you can treat the x1 and xN clock domains as >if they were one single clock domain; hold times >can be avoided when crossing the boundary in either >direction.Do the vendors actually support that mode? It seems reasonable, but I remember some discussion from a year or three ago where somebody eventually tracked a bug down to it not quite working. Newer silicon might take that into account.. The basic idea is that the Xilinx tools don't bother checking hold times. All their FFs have "0 hold time". What that really means is that the min clock-to-out time plus min prop delays are enough to cover the hold time and the clock skew. The catch is that you can get additional skew if you are using two clocks even though they should be aligned. -- These are my opinions, not necessarily my employer's. I hate spam.
Reply by ●November 18, 20082008-11-18
Jonathan> Every cloud has a silver lining, but it seems > every rose has its thorns too. > > PLLs/DCMs/DLLs . . . > > We've easily enough got around this for the present > design, but I'd love to know what all you seasoned > PLL/DCM users out there do about it. Do you > introduce small non-zero time delays in all the > signals crossing the clock domains, so that it all > works in simulation? Do you treat the various > clock domains as if they were asynchronous, thereby > losing one of the nicest benefits of the PLLs? Or > do you simply accept that it's necessary to do timing > simulation in order to see what will really happen?Haven't had to do this, so I will introduce a fourth question, if all clocks are truely aligned have you tried removing delta cycle differences via adding a small non-zero time delay (less than tperiod_Clk/2) to the clock outputs? Clk_X1_DS <= Clk_X1 after 1 ns ; Clk_X2_DS <= Clk_X2 after 1 ns ; Clk_X4_DS <= Clk_X4 after 1 ns ; Since synthesis tools ignore after (or at least are supposed to), this should be ok to add to the RTL code. Cheers, Jim -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jim Lewis SynthWorks VHDL Training http://www.synthworks.com A bird in the hand may be worth two in the bush, but it sure makes it hard to type.





