comp.arch.fpga | Metastability mitigation and I/O registers| page 2

Reply by Gabor ●July 20, 20132013-07-20

On 7/18/2013 2:05 PM, rickman wrote:
>
> Are you doing any logic at all on these data signals?  If not, I can't
> imagine there wouldn't be enough delay to provide metastability
> protection.  If you aren't doing any logic on these signal chains, why
> would it matter if one of the bits goes meta stable?
>
> Think of it like quantum mechanics.  The sample that was taken at the
> time of a transition doesn't really "know" if it is a one or a zero.  It
> only then matters when you "look" at it with logic.  If you are just
> sending the bit somewhere else I can't see how the fact that it is not
> resolved by the next clock edge can have an impact on anything.  It just
> makes the next stage metastable and you have a second shot at resolving it.
>

The point is that after the sampling flop you want to treat this
signal as synchronous.  Now suppose the output of the sampling
flop goes metastable for longer than the slack time on its output
path.  This is where you can no longer treat this as a synchronous
signal.  If it goes to more than one downstream flop, then you
have the possibility that one will sample the transition on
this clock edge and the other won't sample the transition until
the next clock.  This is how FSM's go "zero-hot."  Adding a second
flop with no logic between reduces the chance of this happening
downstream of the second flop by many orders of magnitude.  This is
because now the metastability event from the first flop would need
to resolve just at the metastability window of the second flop
in order to cause any further issue downstream.  As I said before
this window can be extremely small, much smaller than the setup/
hold window.

> How fast is your clock and your input transition rate?  That will help
> determine how much slack you need to resolve the metastability to a
> vanishing small probability.  I believe you should be able to spec the
> slack time in the path you are talking about.  Then you don't care where
> the FFs are, just that the routing meets your timing requirement.
>

"Vanishingly small" may mean one thing to one person and another thing
to others.  If for example you can say that 1 ns of slack gets you
to an event rate of 1 per month, is that "vanishingly small" or do
you want to add that second flop and get to something like one
event per millennium?  I rarely use a two-stage synchronizer, but
then again nobody's life depends on the operation of my logic
designs.

-- 
Gabor

Reply by glen herrmannsfeldt ●July 20, 20132013-07-20

Gabor <gabor@szakacs.org> wrote:

(snip on metastability)
 
> The point is that after the sampling flop you want to treat this
> signal as synchronous.  Now suppose the output of the sampling
> flop goes metastable for longer than the slack time on its output
> path.  This is where you can no longer treat this as a synchronous
> signal.  If it goes to more than one downstream flop, then you
> have the possibility that one will sample the transition on
> this clock edge and the other won't sample the transition until
> the next clock.  

Yes, but note that problem is still there even without
metastability. If the transition is close to the clock, and
the delay to the different FFs is (even slightly) different,
they can clock the input on different clock edges.

FIFOs use gray code to resolve that problem.

> This is how FSM's go "zero-hot."  Adding a second
> flop with no logic between reduces the chance of this happening
> downstream of the second flop by many orders of magnitude.  This is
> because now the metastability event from the first flop would need
> to resolve just at the metastability window of the second flop
> in order to cause any further issue downstream.  As I said before
> this window can be extremely small, much smaller than the setup/
> hold window.

If the critical path delay is 80% of the clock period, the 
system can fail if the metastabilty time is 20% of the clock
period. 
 
>> How fast is your clock and your input transition rate?  That will help
>> determine how much slack you need to resolve the metastability to a
>> vanishing small probability.  I believe you should be able to spec the
>> slack time in the path you are talking about.  Then you don't care where
>> the FFs are, just that the routing meets your timing requirement.
 
> "Vanishingly small" may mean one thing to one person and another thing
> to others.  If for example you can say that 1 ns of slack gets you
> to an event rate of 1 per month, is that "vanishingly small" or do
> you want to add that second flop and get to something like one
> event per millennium?  I rarely use a two-stage synchronizer, but
> then again nobody's life depends on the operation of my logic
> designs.

If you have a 100MHz clock, and one event per month, that is one
in 2.5e14 clock cycles. As metastability resolves exponentially,
with a full cycle in between it will fail about every 2.5e14
to the fifth power cycles, or about one in 3e55 months.
That should be long enough for just about everyone.

-- glen

Reply by rickman ●July 20, 20132013-07-20

On 7/19/2013 11:33 PM, Gabor wrote:
> On 7/18/2013 2:05 PM, rickman wrote:
>>
>> Are you doing any logic at all on these data signals? If not, I can't
>> imagine there wouldn't be enough delay to provide metastability
>> protection. If you aren't doing any logic on these signal chains, why
>> would it matter if one of the bits goes meta stable?
>>
>> Think of it like quantum mechanics. The sample that was taken at the
>> time of a transition doesn't really "know" if it is a one or a zero. It
>> only then matters when you "look" at it with logic. If you are just
>> sending the bit somewhere else I can't see how the fact that it is not
>> resolved by the next clock edge can have an impact on anything. It just
>> makes the next stage metastable and you have a second shot at
>> resolving it.
>>
>
> The point is that after the sampling flop you want to treat this
> signal as synchronous. Now suppose the output of the sampling
> flop goes metastable for longer than the slack time on its output
> path. This is where you can no longer treat this as a synchronous
> signal. If it goes to more than one downstream flop, then you
> have the possibility that one will sample the transition on
> this clock edge and the other won't sample the transition until
> the next clock. This is how FSM's go "zero-hot." Adding a second
> flop with no logic between reduces the chance of this happening
> downstream of the second flop by many orders of magnitude. This is
> because now the metastability event from the first flop would need
> to resolve just at the metastability window of the second flop
> in order to cause any further issue downstream. As I said before
> this window can be extremely small, much smaller than the setup/
> hold window.

Yes, this is all classic metastability stuff.  If this is just data and 
is not used to control any FSMs or otherwise branches out to multiple 
FFs, metastability won't matter.  That was my point about it only 
mattering when the value of the signal is "looked" at.  If this is just 
data being clocked into another FF it just doesn't matter, the next FF 
just gives you more metastability resolution time before the signal 
reaches some point in the circuit where it does matter.

>> How fast is your clock and your input transition rate? That will help
>> determine how much slack you need to resolve the metastability to a
>> vanishing small probability. I believe you should be able to spec the
>> slack time in the path you are talking about. Then you don't care where
>> the FFs are, just that the routing meets your timing requirement.
>>
>
> "Vanishingly small" may mean one thing to one person and another thing
> to others. If for example you can say that 1 ns of slack gets you
> to an event rate of 1 per month, is that "vanishingly small" or do
> you want to add that second flop and get to something like one
> event per millennium? I rarely use a two-stage synchronizer, but
> then again nobody's life depends on the operation of my logic
> designs.

I think the term vanishingly small is pretty universal.  That is why I 
didn't say "1 failure per month" or "1 failure per millennium".  As to 
what numbers are required, that is up to the designer and the 
application.  How much timing slack is required depends on the failure 
rate needed, the clock and data rates and the details of the logic 
family used.

At one point some folks at Xilinx made a pretty good case for 
standardizing on 2 ns since it gives you a 1 in a billion year failure 
rate (or something like that) with 100 MHz clocks and data.  I don't 
recall the exact numbers, but they made it clear that anything longer 
than 2 ns was gravy for most designs in FPGAs.  But more recently I have 
read here that the newer FPGA families are trending back in the other 
direction so that longer slack times may be needed for high end designs. 
  But it doesn't seem like there is a "metastability voice" at the FPGA 
companies anymore.  Do they even publish the relevant numbers for recent 
families?

-- 

Rick

Reply by RCIngham ●July 22, 20132013-07-22

Gentlemen all,

Thank you for an interesting discussion. Just to clarify a few points:
1. It's a ProASIC3 design, so not bleeding edge technology, but there is
enough data on the Microsemi web-site to calculate MTBFs if the inter-FF
delay is known.
2. I am assuming that each of these inputs is independent.
3. The system clock is < 10 MHz, but a "rule-of-thumb" that works for
higher frequencies would be nice.
4. The design is flight safety-critical, so I would like the MTBF relating
to metastability to be "a lot of years", but it isn't (so far) a quantified
system requirement.

While I am sorting out the "build from a script, not the GUI" process, I
think that I will add the appropriate sort of timing constraint to my trial
sand-box design, and see where it breaks for both cases. I will then report
back.

Many thanks,
Robert
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Reply by RCIngham ●July 23, 20132013-07-23

Well, as I couldn't work out how to specify the timing constraint(s) I
needed, I did some post-layout simulations instead. I split the bunch of
discretes into two, half with '-register yes' and the others with
'-register no' in the 'set_io' constraint, then swapped the constraints
over for a second build+simulate pass.

In both cases the arrival at the 'D' input of the second register was
slightly faster and slightly more consistent in the '-register yes' cases.
The time delta between groups was about 150ps, so it might be worth having.
And I now have a possibly accurate CK-to-D time to put in my Metastability
MTBF spreadsheet.

YMMV for other technologies.
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com

Previous 12Next

Metastability mitigation and I/O registers

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group