FPGARelated.com
Forums

Simulation deltas

Started by Carl April 3, 2014
Hi,

This question deals both with an actual problem, and with some more concept=
ual thoughts on simulation deltas and how an RTL entity should behave with =
regards to this.

This post regards the case of a simulation with ideal time - that is, no de=
lays (in time) modelled, rather trusting only simulation deltas for the ord=
ering of events.


*Conceptual*

I would argue that for a well-behaved synchronous RTL entity, the following=
 must be true:

*All readings of the input ports must be made *on* the delta of the rising =
flank of the clock - not one or any other number of deltas after that.*

Would people agree on that?

It follows from the possibility of other logic, hierarchically above the en=
tity in question, altering the input ports as little as one delta after the=
 rising flank. That must be allowed.


*My actual problem*

After a lot of debugging of one of my simulations, I found a Xilinx simulat=
ion primitive (IDELAYE2 in Unisim) *not* adhering to the statement in the p=
revious section, which had caused all the problems.

See the signals plotted here:
http://www.fpga-dev.com/misc/deltaDelayProblem.png

It's enough to focus on the "ports" section. The ports are:
- c: in, the clock
- cntValueIn: in
- ld: in, writeEnable for writing cntValueIn to an internal register
- cntValueOut: out, giving the contents of that register

As can be seen, my 'ld' operation is de-asserted one delta after the rising=
 flank. I argue this should be OK, but it is obvious that the data is never=
 written (cntValueOut remains 0). If I delay the de-assertion of 'ld' just =
one more delta, the write *does* take effect as desired.

I would argue this is a (serious) flaw of the Xilinx primitive. Would peopl=
e agree on that as well?


(The following is not central for the above discussion, may be skipped.)

I have checked the actual reason for the problem. See the "internals" secti=
on of the signals. First, Xilinx delays both the clock and the ports to the=
 *_dly signals. Fully OK, if from now on operating on the delayed signals. =
The problem is that the process writing to the internal register is not clo=
cked by c_dly, but by another signal, c_in, which is delayed *one more* del=
ta. This causes my requested 'ld' to be missed. (c_in is driven from c_dly =
in another process, inverting the the clock input if the user has requested=
 that.)

I argue that synchronous entities must be modelled in such a way that all p=
rocesses reading input ports *must* be clocked directly by the input clock =
port - not by some derived signal that is lagging (if only by one delta). I=
f this is not possible, the input ports being read must be delayed accordin=
gly. In this case, if Xilinx wishes to conditionally invert the clock like =
this, causing another delta of delay, the input ports must also be delayed =
the corresponding number of deltas.


Cheers,
Carl
On Thursday, April 3, 2014 9:01:34 AM UTC-4, Carl wrote:
> *Conceptual*=20 >=20 > I would argue that for a well-behaved synchronous RTL entity, the followi=
ng=20
> must be true:=20 >=20 > *All readings of the input ports must be made *on* the delta of the risin=
g=20
> flank of the clock - not one or any other number of deltas after that.*=
=20
>=20 > Would people agree on that?=20 >=20
I would not agree, conceptual reasoning is as follows: - The clock causes something to happen - Something that causes 'something else' to happen must precede 'something = else' because this is a causal world we live in.
> It follows from the possibility of other logic, hierarchically above the =
entity=20
> in question, altering the input ports as little as one delta after the ri=
sing=20
> flank. That must be allowed.=20 >=20
Hierarchy does not alter signals. You can go through as many levels of hie= rarchy as you want and it will not change the time (including simulation de= lta time) that a signal changes. What *will* change that time are statemen= ts such as 'clk_out <=3D clk_in' but that is because a new signal called 'c= lk_out' has been created and that is *not* the same thing as 'clk_in'...sin= ce we live in a causal world, 'clk_out' must occur after 'clk_in'. Granted= a synthesizer will ignore optimize the statement and just use 'clk_in' whe= rever 'clk_out' goes, but that is a different tangent than what you're aski= ng. Kevin Jennings
Den torsdagen den 3:e april 2014 kl. 15:24:49 UTC+2 skrev KJ:
> On Thursday, April 3, 2014 9:01:34 AM UTC-4, Carl wrote: > > > > I would argue that for a well-behaved synchronous RTL entity, the follo=
wing=20
> > must be true:=20 > > > > *All readings of the input ports must be made *on* the delta of the ris=
ing=20
> > flank of the clock - not one or any other number of deltas after that.*=
=20
> > > > Would people agree on that?=20 >=20 > I would not agree, conceptual reasoning is as follows: > - The clock causes something to happen > - Something that causes 'something else' to happen must precede 'somethin=
g else' because this is a causal world we live in. I don't really get what your two points mean in this context. I do understa= nd and agree on the literal meaning of them. I don't think those points necessariyl adress my issue. My issue doesn't on= ly relate to causality. Then main problem is to determine *exactly when som= ething is sampled*. Since you don't agree with the statement however; how then should synchrono= us elements communicate with each other? If I clock a unit with 'clk', and = I can't expect that unit to sample the input ports (which I drive) on (exac= tly on, without any delta delays) the rising edge of 'clk', then how long a= fter the edge must I hold the input data stable? One delta? Two, ten? One p= s, one ns? (If the answer is anything more than deltas, e.i. involving time, we are no= longer in functional modelling, which was an assumption for this question.= ) Or how would you suggest the problem I illustrated should be avoided?
> > It follows from the possibility of other logic, hierarchically above th=
e entity=20
>=20 > > in question, altering the input ports as little as one delta after the =
rising=20
>=20 > > flank. That must be allowed.=20 >=20 > Hierarchy does not alter signals. You can go through as many levels of h=
ierarchy as you want and it will not change the time (including simulation = delta time) that a signal changes. What *will* change that time are statem= ents such as 'clk_out <=3D clk_in' but that is because a new signal called = 'clk_out' has been created and that is *not* the same thing as 'clk_in'...s= ince we live in a causal world, 'clk_out' must occur after 'clk_in'. =20 Well of course I agree on all that. This is not about hierarchy. Maybe that= was bad wording by me. This is about how you should expect functional, syn= chronous elements (possibly developed by others) to behave.
> Granted a synthesizer will ignore optimize the statement and just use 'cl=
k_in' wherever 'clk_out' goes, but that is a different tangent than what yo= u're asking. Yes, that's something else. A synthesis tools knows about the clocks and si= gnals and warns for any setup/hold time violations. My question regards ide= al functional models. Ideal and functional in the sens that delays are not = modelled (rather trusting the delta's to keep track of event ordering). If = delays would be modelled as well, these problems would not arise.
On Thursday, April 3, 2014 10:42:56 AM UTC-4, Carl wrote:
> I don't really get what your two points mean in this context. I do unders=
tand=20
> and agree on the literal meaning of them.=20 >=20 > I don't think those points necessariyl adress my issue. My issue doesn't =
only=20
> relate to causality. Then main problem is to determine *exactly when some=
thing=20
> is sampled*.=20 >=20 > Since you don't agree with the statement however; how then should synchro=
nous=20
> elements communicate with each other? If I clock a unit with 'clk', and I=
can't=20
> expect that unit to sample the input ports (which I drive) on (exactly on=
,=20
> without any delta delays) the rising edge of 'clk', then how long after t=
he=20
> edge must I hold the input data stable? One delta? Two, ten? One ps, one =
ns?=20
>=20
Actually, I misread a bit your actual question, I do agree that inputs shou= ld get sampled on only one simulation delta cycle...and they do. For some = reason, I thought you were talking about outputs being generated. In any case, your conceptual question doesn't relate to the problem that yo= u are seeing with the Xilinx primitive. I have no idea whether it correctl= y models the primitive or not, but let's assume for a moment that it is cor= rect. Since that primitive is attempting to model reality, there very well= would be a delay between the input clock to that primitive and when that p= rimitive actually samples input signals. If that is the situation, then inp= uts must also model reality in that they cannot be changing instantaneously= either. Inputs to such a model must meet the setup/hold constraints of th= e design. When you're performing functional simulation, there can be an assumption th= at you can ignore setup/hold time issues. This is an invalid assumption if= you include parts into your model that model reality where delays do occur= . The model is not wrong in that case, it is your usage of that model. Just like on a physical board, on the input side to such a model, you need = to insure that you do not violate setup or hold constraints. If you do, th= en a physical board will not always work, in a simulation environment your = simulation will fail (which is what you're experiencing). On the output si= de of a model, you need to make sure that you're not sampling too early (i.= e. sooner than the Tco min). Kevin Jennings
KJ wrote:
> On Thursday, April 3, 2014 10:42:56 AM UTC-4, Carl wrote: >> I don't really get what your two points mean in this context. I do understand >> and agree on the literal meaning of them. >> >> I don't think those points necessariyl adress my issue. My issue doesn't only >> relate to causality. Then main problem is to determine *exactly when something >> is sampled*. >> >> Since you don't agree with the statement however; how then should synchronous >> elements communicate with each other? If I clock a unit with 'clk', and I can't >> expect that unit to sample the input ports (which I drive) on (exactly on, >> without any delta delays) the rising edge of 'clk', then how long after the >> edge must I hold the input data stable? One delta? Two, ten? One ps, one ns? >> > Actually, I misread a bit your actual question, I do agree that inputs should get sampled on only one simulation delta cycle...and they do. For some reason, I thought you were talking about outputs being generated. > > In any case, your conceptual question doesn't relate to the problem that you are seeing with the Xilinx primitive. I have no idea whether it correctly models the primitive or not, but let's assume for a moment that it is correct. Since that primitive is attempting to model reality, there very well would be a delay between the input clock to that primitive and when that primitive actually samples input signals. If that is the situation, then inputs must also model reality in that they cannot be changing instantaneously either. Inputs to such a model must meet the setup/hold constraints of the design. > > When you're performing functional simulation, there can be an assumption that you can ignore setup/hold time issues. This is an invalid assumption if you include parts into your model that model reality where delays do occur. The model is not wrong in that case, it is your usage of that model. > > Just like on a physical board, on the input side to such a model, you need to insure that you do not violate setup or hold constraints. If you do, then a physical board will not always work, in a simulation environment your simulation will fail (which is what you're experiencing). On the output side of a model, you need to make sure that you're not sampling too early (i.e. sooner than the Tco min). > > Kevin Jennings
Then perhaps the error in the xilinx case is that they are applying a physical model when you call up a behavioral simulation. I remember that the BRAM models (at least for VHDL) had a similar issue causing the behavioral simulation to look as if the readout was not registered unless you had some delay on the address inputs. -- Gabor
On Thursday, April 3, 2014 6:01:34 AM UTC-7, Carl wrote:
> Hi, >=20 >=20 >=20 > This question deals both with an actual problem, and with some more conce=
ptual thoughts on simulation deltas and how an RTL entity should behave wit= h regards to this.
>=20 >=20 >=20 > This post regards the case of a simulation with ideal time - that is, no =
delays (in time) modelled, rather trusting only simulation deltas for the o= rdering of events.
>=20 >=20 >=20 >=20 >=20 > *Conceptual* >=20 >=20 >=20 > I would argue that for a well-behaved synchronous RTL entity, the followi=
ng must be true:
>=20 >=20 >=20 > *All readings of the input ports must be made *on* the delta of the risin=
g flank of the clock - not one or any other number of deltas after that.*
>=20 >=20 >=20 > Would people agree on that? >=20 >=20 >=20 > It follows from the possibility of other logic, hierarchically above the =
entity in question, altering the input ports as little as one delta after t= he rising flank. That must be allowed.
>=20 >=20 >=20 >=20 >=20 > *My actual problem* >=20 >=20 >=20 > After a lot of debugging of one of my simulations, I found a Xilinx simul=
ation primitive (IDELAYE2 in Unisim) *not* adhering to the statement in the= previous section, which had caused all the problems.
>=20 >=20 >=20 > See the signals plotted here: >=20 > http://www.fpga-dev.com/misc/deltaDelayProblem.png >=20 >=20 >=20 > It's enough to focus on the "ports" section. The ports are: >=20 > - c: in, the clock >=20 > - cntValueIn: in >=20 > - ld: in, writeEnable for writing cntValueIn to an internal register >=20 > - cntValueOut: out, giving the contents of that register >=20 >=20 >=20 > As can be seen, my 'ld' operation is de-asserted one delta after the risi=
ng flank. I argue this should be OK, but it is obvious that the data is nev= er written (cntValueOut remains 0). If I delay the de-assertion of 'ld' jus= t one more delta, the write *does* take effect as desired.
>=20 >=20 >=20 > I would argue this is a (serious) flaw of the Xilinx primitive. Would peo=
ple agree on that as well?
>=20 >=20 >=20 >=20 >=20 > (The following is not central for the above discussion, may be skipped.) >=20 >=20 >=20 > I have checked the actual reason for the problem. See the "internals" sec=
tion of the signals. First, Xilinx delays both the clock and the ports to t= he *_dly signals. Fully OK, if from now on operating on the delayed signals= . The problem is that the process writing to the internal register is not c= locked by c_dly, but by another signal, c_in, which is delayed *one more* d= elta. This causes my requested 'ld' to be missed. (c_in is driven from c_dl= y in another process, inverting the the clock input if the user has request= ed that.)
>=20 >=20 >=20 > I argue that synchronous entities must be modelled in such a way that all=
processes reading input ports *must* be clocked directly by the input cloc= k port - not by some derived signal that is lagging (if only by one delta).= If this is not possible, the input ports being read must be delayed accord= ingly. In this case, if Xilinx wishes to conditionally invert the clock lik= e this, causing another delta of delay, the input ports must also be delaye= d the corresponding number of deltas.
>=20 >=20 >=20 >=20 >=20 > Cheers, >=20 > Carl
I would agree with Kevin's assessment and offer an easy solution. As soon a= s you involve vendor supplied models you might as well just assume that the= y are not purely behavioral in the sense you are describing. The easy way t= o deal with this is to move edges of stimulus signals in test benches to th= e falling edge of the clock, and to ensure your clock is running in simulat= ion at an appropriate time period as it would in the real hardware.
matt.lettau@gmail.com wrote:
> On Thursday, April 3, 2014 6:01:34 AM UTC-7, Carl wrote: >> Hi, >> >> >> >> This question deals both with an actual problem, and with some more conceptual thoughts on simulation deltas and how an RTL entity should behave with regards to this. >> >> >> >> This post regards the case of a simulation with ideal time - that is, no delays (in time) modelled, rather trusting only simulation deltas for the ordering of events. >> >> >> >> >> >> *Conceptual* >> >> >> >> I would argue that for a well-behaved synchronous RTL entity, the following must be true: >> >> >> >> *All readings of the input ports must be made *on* the delta of the rising flank of the clock - not one or any other number of deltas after that.* >> >> >> >> Would people agree on that? >> >> >> >> It follows from the possibility of other logic, hierarchically above the entity in question, altering the input ports as little as one delta after the rising flank. That must be allowed. >> >> >> >> >> >> *My actual problem* >> >> >> >> After a lot of debugging of one of my simulations, I found a Xilinx simulation primitive (IDELAYE2 in Unisim) *not* adhering to the statement in the previous section, which had caused all the problems. >> >> >> >> See the signals plotted here: >> >> http://www.fpga-dev.com/misc/deltaDelayProblem.png >> >> >> >> It's enough to focus on the "ports" section. The ports are: >> >> - c: in, the clock >> >> - cntValueIn: in >> >> - ld: in, writeEnable for writing cntValueIn to an internal register >> >> - cntValueOut: out, giving the contents of that register >> >> >> >> As can be seen, my 'ld' operation is de-asserted one delta after the rising flank. I argue this should be OK, but it is obvious that the data is never written (cntValueOut remains 0). If I delay the de-assertion of 'ld' just one more delta, the write *does* take effect as desired. >> >> >> >> I would argue this is a (serious) flaw of the Xilinx primitive. Would people agree on that as well? >> >> >> >> >> >> (The following is not central for the above discussion, may be skipped.) >> >> >> >> I have checked the actual reason for the problem. See the "internals" section of the signals. First, Xilinx delays both the clock and the ports to the *_dly signals. Fully OK, if from now on operating on the delayed signals. The problem is that the process writing to the internal register is not clocked by c_dly, but by another signal, c_in, which is delayed *one more* delta. This causes my requested 'ld' to be missed. (c_in is driven from c_dly in another process, inverting the the clock input if the user has requested that.) >> >> >> >> I argue that synchronous entities must be modelled in such a way that all processes reading input ports *must* be clocked directly by the input clock port - not by some derived signal that is lagging (if only by one delta). If this is not possible, the input ports being read must be delayed accordingly. In this case, if Xilinx wishes to conditionally invert the clock like this, causing another delta of delay, the input ports must also be delayed the corresponding number of deltas. >> >> >> >> >> >> Cheers, >> >> Carl > > I would agree with Kevin's assessment and offer an easy solution. As soon as you involve vendor supplied models you might as well just assume that they are not purely behavioral in the sense you are describing. The easy way to deal with this is to move edges of stimulus signals in test benches to the falling edge of the clock, and to ensure your clock is running in simulation at an appropriate time period as it would in the real hardware.
The problem with that approach is that the vendor IP is driven by user IP and not the test bench directly. You certainly don't want the user IP (for synthesis) working on the opposite clock edge. In the past I have worked around the Xilinx model issues by adding unit delays in the code that instantiates it, but even that leaves a bad taste in my mouth, as it shouldn't be necessary for behavioral simulation. -- Gabor
On Friday, April 4, 2014 12:01:33 PM UTC-4, Gabor wrote:
> The problem with that approach is that the vendor IP is driven by user=20 > IP and not the test bench directly.
I didn't see anything in the OP indicating whether the driving signals were= testbench or design...but you could be right.
> You certainly don't want the=20 > user IP (for synthesis) working on the opposite clock edge. In the=20 > past I have worked around the Xilinx model issues by adding unit delays=
=20
> in the code that instantiates it, but even that leaves a bad taste in=20 > my mouth, as it shouldn't be necessary for behavioral simulation.
Again the way to fight a model that tries to model reality is with more 're= ality' of your own. Make the assignments that assign to signals that conne= ct with the primitive be delayed by 1 ns (i.e. "a <=3D b after 1 ns;"). Sy= nthesis tools ignore the 'after' clause, sim does not. I agree that you shouldn't have to do this when you're simulating the origi= nal design sources (but I thought he was simulating a post-route design bei= ng driven by a testbench). It's ugly, but I guess that is part of the bagg= age that comes with Brand X...maybe switch to Brand A and see if the laundr= y comes out cleaner. Kevin Jennings
On 4/3/2014 1:17 PM, KJ wrote:
> On Thursday, April 3, 2014 10:42:56 AM UTC-4, Carl wrote: >> I don't really get what your two points mean in this context. I do understand >> and agree on the literal meaning of them. >> >> I don't think those points necessariyl adress my issue. My issue doesn't only >> relate to causality. Then main problem is to determine *exactly when something >> is sampled*. >> >> Since you don't agree with the statement however; how then should synchronous >> elements communicate with each other? If I clock a unit with 'clk', and I can't >> expect that unit to sample the input ports (which I drive) on (exactly on, >> without any delta delays) the rising edge of 'clk', then how long after the >> edge must I hold the input data stable? One delta? Two, ten? One ps, one ns? >> > Actually, I misread a bit your actual question, I do agree that inputs should get sampled on only one simulation delta cycle...and they do. For some reason, I thought you were talking about outputs being generated. > > In any case, your conceptual question doesn't relate to the problem that you are seeing with the Xilinx primitive. I have no idea whether it correctly models the primitive or not, but let's assume for a moment that it is correct. Since that primitive is attempting to model reality, there very well would be a delay between the input clock to that primitive and when that primitive actually samples input signals. If that is the situation, then inputs must also model reality in that they cannot be changing instantaneously either. Inputs to such a model must meet the setup/hold constraints of the design.
This is a specious argument. Delta delays are not in any way related to physical delays and are intended to deal with issues in the logic of simulation, not real world physics. If the Xilinx primitive is trying to model timing delays it has done a pretty durn poor job of it since a delta delay is zero simulation time.
> When you're performing functional simulation, there can be an assumption that you can ignore setup/hold time issues. This is an invalid assumption if you include parts into your model that model reality where delays do occur. The model is not wrong in that case, it is your usage of that model.
This model is clearly *not* modeling timing delays. Just read his description of the problem and you will see that.
> Just like on a physical board, on the input side to such a model, you need to insure that you do not violate setup or hold constraints. If you do, then a physical board will not always work, in a simulation environment your simulation will fail (which is what you're experiencing). On the output side of a model, you need to make sure that you're not sampling too early (i.e. sooner than the Tco min).
This discussion is not at all about setup or hold times. The OP is performing functional simulation which is very much like unit delay simulation. The purpose of delta delays are to prevent the order of evaluating sequential logic from affecting the outcome. So the output of all logic gets a delta delay (zero simulation time, but logically delayed only) so that the output change is indeed causal and can not affect other sequential elements on that same clock edge. In fact, this is the classic problem where a logic element is inserted into the clock path for some sequential elements and not others creating the exact problem the OP is observing. Normally, designers know not to do this. I guess someone at Xilinx was out that day in the training class. -- Rick
On Saturday, April 5, 2014 1:02:16 PM UTC-4, rickman wrote:
> > In any case, your conceptual question doesn't relate to the problem tha=
t you=20
> > are seeing with the Xilinx primitive. I have no idea whether it correc=
tly > > models the primitive or not, but let's assume for a moment that it = is
> > correct. Since that primitive is attempting to model reality, there v=
ery=20
> > well would be a delay between the input clock to that primitive and whe=
n=20
> > that primitive actually samples input signals. If that is the situation=
,=20
> > then inputs must also model reality in that they cannot be changing=20 > > instantaneously either. Inputs to such a model must meet the setup/hol=
d > > > constraints of the design.=20
> This is a specious argument. Delta delays are not in any way related to=
=20
> physical delays and are intended to deal with issues in the logic of=20 > simulation, not real world physics.
Nothing at all specious, it is correct. If you're connecting to a block th= at models delays (and the OP's does), then the solution is to model reality= as well on the inputs in order to meet setup/hold time as well as to not s= ample outputs before Tco max. Whether those delays are caused by the model= using delta delays or real time delays does not change the fact that the s= olution I provided is correct. It will be correct if the offending model u= ses delta delays or actual post-route delays.
> > When you're performing functional simulation, there can be an assumptio=
n > > > that you can ignore setup/hold time issues. This is an invalid ass= umption=20
> > if you include parts into your model that model reality where delays do=
=20
> > occur. The model is not wrong in that case, it is your usage of that > > model.=20
> This model is clearly *not* modeling timing delays. Just read his=20 > description of the problem and you will see that.=20
I did read the post, and there are timing delays. Just because the delays = are simulation deltas does not make them 'not a delay'. Since the model he= is using implements these delays, the user needs to account for that. If = you don't want to account for it, then you should use a different model.
> > Just like on a physical board, on the input side to such a model, you n=
eed=20
> > to insure that you do not violate setup or hold constraints. If you do=
,=20
> > then a physical board will not always work, in a simulation environment=
=20
> > your simulation will fail (which is what you're experiencing). On the=
=20
> > output side of a model, you need to make sure that you're not sampling =
too=20
> > early (i.e. sooner than the Tco min).=20
> This discussion is not at all about setup or hold times. The OP is=20 > performing functional simulation which is very much like unit delay=20 > simulation.
I agree that the OP's problem is not about setup or hold times. The work a= round/solution I suggested was to add delays in order to conform with setup= or hold times, "Just like on a physical board...". My solution has a dire= ct connection with reality (i.e. a physical board with the design programme= d in), other solutions might not. If you're adding something to work around some problem, you're on much firm= er ground if there is an actual basis that can be traced back to specificat= ions. On the assumption that the external thing connected to the part bein= g worked around is a physical part, ask yourself if adding Tpd and Tco dela= ys to that model makes it closer or farther away from a 'true' model of tha= t part. Someone else posted that they typically worked around this by changing the = inputs to be driven by the opposite edge of the clock. That probably works= also, but again ask yourself does that make the simulation model closer to= reality? Don't think so. Of course, there is also the possibility that the stuff connecting to the X= ilinx primitive is itself internal to the device in which case I suggested = adding a 1 ns (or really whatever small non-zero time delay you want). Aga= in, inside a real device, the output of a flop will not change in zero time= so adding a small nominal delay as a work around can be justified as model= ing reality. In any case, the work around you use should have a rational basis for being= the way it is. If the only justification is that 'it was the only way I c= ould get the sim to run' then there is probably a design error that is bein= g covered up, rather than a model limitation that is being worked around. Kevin Jennings