FPGARelated.com
Forums

Metastability mitigation and I/O registers

Started by RCIngham July 18, 2013
On 7/18/2013 2:05 PM, rickman wrote:
> > Are you doing any logic at all on these data signals? If not, I can't > imagine there wouldn't be enough delay to provide metastability > protection. If you aren't doing any logic on these signal chains, why > would it matter if one of the bits goes meta stable? > > Think of it like quantum mechanics. The sample that was taken at the > time of a transition doesn't really "know" if it is a one or a zero. It > only then matters when you "look" at it with logic. If you are just > sending the bit somewhere else I can't see how the fact that it is not > resolved by the next clock edge can have an impact on anything. It just > makes the next stage metastable and you have a second shot at resolving it. >
The point is that after the sampling flop you want to treat this signal as synchronous. Now suppose the output of the sampling flop goes metastable for longer than the slack time on its output path. This is where you can no longer treat this as a synchronous signal. If it goes to more than one downstream flop, then you have the possibility that one will sample the transition on this clock edge and the other won't sample the transition until the next clock. This is how FSM's go "zero-hot." Adding a second flop with no logic between reduces the chance of this happening downstream of the second flop by many orders of magnitude. This is because now the metastability event from the first flop would need to resolve just at the metastability window of the second flop in order to cause any further issue downstream. As I said before this window can be extremely small, much smaller than the setup/ hold window.
> How fast is your clock and your input transition rate? That will help > determine how much slack you need to resolve the metastability to a > vanishing small probability. I believe you should be able to spec the > slack time in the path you are talking about. Then you don't care where > the FFs are, just that the routing meets your timing requirement. >
"Vanishingly small" may mean one thing to one person and another thing to others. If for example you can say that 1 ns of slack gets you to an event rate of 1 per month, is that "vanishingly small" or do you want to add that second flop and get to something like one event per millennium? I rarely use a two-stage synchronizer, but then again nobody's life depends on the operation of my logic designs. -- Gabor
Gabor <gabor@szakacs.org> wrote:

(snip on metastability)
 
> The point is that after the sampling flop you want to treat this > signal as synchronous. Now suppose the output of the sampling > flop goes metastable for longer than the slack time on its output > path. This is where you can no longer treat this as a synchronous > signal. If it goes to more than one downstream flop, then you > have the possibility that one will sample the transition on > this clock edge and the other won't sample the transition until > the next clock.
Yes, but note that problem is still there even without metastability. If the transition is close to the clock, and the delay to the different FFs is (even slightly) different, they can clock the input on different clock edges. FIFOs use gray code to resolve that problem.
> This is how FSM's go "zero-hot." Adding a second > flop with no logic between reduces the chance of this happening > downstream of the second flop by many orders of magnitude. This is > because now the metastability event from the first flop would need > to resolve just at the metastability window of the second flop > in order to cause any further issue downstream. As I said before > this window can be extremely small, much smaller than the setup/ > hold window.
If the critical path delay is 80% of the clock period, the system can fail if the metastabilty time is 20% of the clock period.
>> How fast is your clock and your input transition rate? That will help >> determine how much slack you need to resolve the metastability to a >> vanishing small probability. I believe you should be able to spec the >> slack time in the path you are talking about. Then you don't care where >> the FFs are, just that the routing meets your timing requirement.
> "Vanishingly small" may mean one thing to one person and another thing > to others. If for example you can say that 1 ns of slack gets you > to an event rate of 1 per month, is that "vanishingly small" or do > you want to add that second flop and get to something like one > event per millennium? I rarely use a two-stage synchronizer, but > then again nobody's life depends on the operation of my logic > designs.
If you have a 100MHz clock, and one event per month, that is one in 2.5e14 clock cycles. As metastability resolves exponentially, with a full cycle in between it will fail about every 2.5e14 to the fifth power cycles, or about one in 3e55 months. That should be long enough for just about everyone. -- glen
On 7/19/2013 11:33 PM, Gabor wrote:
> On 7/18/2013 2:05 PM, rickman wrote: >> >> Are you doing any logic at all on these data signals? If not, I can't >> imagine there wouldn't be enough delay to provide metastability >> protection. If you aren't doing any logic on these signal chains, why >> would it matter if one of the bits goes meta stable? >> >> Think of it like quantum mechanics. The sample that was taken at the >> time of a transition doesn't really "know" if it is a one or a zero. It >> only then matters when you "look" at it with logic. If you are just >> sending the bit somewhere else I can't see how the fact that it is not >> resolved by the next clock edge can have an impact on anything. It just >> makes the next stage metastable and you have a second shot at >> resolving it. >> > > The point is that after the sampling flop you want to treat this > signal as synchronous. Now suppose the output of the sampling > flop goes metastable for longer than the slack time on its output > path. This is where you can no longer treat this as a synchronous > signal. If it goes to more than one downstream flop, then you > have the possibility that one will sample the transition on > this clock edge and the other won't sample the transition until > the next clock. This is how FSM's go "zero-hot." Adding a second > flop with no logic between reduces the chance of this happening > downstream of the second flop by many orders of magnitude. This is > because now the metastability event from the first flop would need > to resolve just at the metastability window of the second flop > in order to cause any further issue downstream. As I said before > this window can be extremely small, much smaller than the setup/ > hold window.
Yes, this is all classic metastability stuff. If this is just data and is not used to control any FSMs or otherwise branches out to multiple FFs, metastability won't matter. That was my point about it only mattering when the value of the signal is "looked" at. If this is just data being clocked into another FF it just doesn't matter, the next FF just gives you more metastability resolution time before the signal reaches some point in the circuit where it does matter.
>> How fast is your clock and your input transition rate? That will help >> determine how much slack you need to resolve the metastability to a >> vanishing small probability. I believe you should be able to spec the >> slack time in the path you are talking about. Then you don't care where >> the FFs are, just that the routing meets your timing requirement. >> > > "Vanishingly small" may mean one thing to one person and another thing > to others. If for example you can say that 1 ns of slack gets you > to an event rate of 1 per month, is that "vanishingly small" or do > you want to add that second flop and get to something like one > event per millennium? I rarely use a two-stage synchronizer, but > then again nobody's life depends on the operation of my logic > designs.
I think the term vanishingly small is pretty universal. That is why I didn't say "1 failure per month" or "1 failure per millennium". As to what numbers are required, that is up to the designer and the application. How much timing slack is required depends on the failure rate needed, the clock and data rates and the details of the logic family used. At one point some folks at Xilinx made a pretty good case for standardizing on 2 ns since it gives you a 1 in a billion year failure rate (or something like that) with 100 MHz clocks and data. I don't recall the exact numbers, but they made it clear that anything longer than 2 ns was gravy for most designs in FPGAs. But more recently I have read here that the newer FPGA families are trending back in the other direction so that longer slack times may be needed for high end designs. But it doesn't seem like there is a "metastability voice" at the FPGA companies anymore. Do they even publish the relevant numbers for recent families? -- Rick
Gentlemen all,

Thank you for an interesting discussion. Just to clarify a few points:
1. It's a ProASIC3 design, so not bleeding edge technology, but there is
enough data on the Microsemi web-site to calculate MTBFs if the inter-FF
delay is known.
2. I am assuming that each of these inputs is independent.
3. The system clock is < 10 MHz, but a "rule-of-thumb" that works for
higher frequencies would be nice.
4. The design is flight safety-critical, so I would like the MTBF relating
to metastability to be "a lot of years", but it isn't (so far) a quantified
system requirement.

While I am sorting out the "build from a script, not the GUI" process, I
think that I will add the appropriate sort of timing constraint to my trial
sand-box design, and see where it breaks for both cases. I will then report
back.

Many thanks,
Robert
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com
Well, as I couldn't work out how to specify the timing constraint(s) I
needed, I did some post-layout simulations instead. I split the bunch of
discretes into two, half with '-register yes' and the others with
'-register no' in the 'set_io' constraint, then swapped the constraints
over for a second build+simulate pass.

In both cases the arrival at the 'D' input of the second register was
slightly faster and slightly more consistent in the '-register yes' cases.
The time delta between groups was about 150ps, so it might be worth having.
And I now have a possibly accurate CK-to-D time to put in my Metastability
MTBF spreadsheet.

YMMV for other technologies.
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com