Sign in

username:

password:



Not a member?

Search Comp.Arch.FPGA



Search tips

fpga by Keywords

Altera | ASIC | CPLD | Cyclone | DCM | DDR | DSP | Ethernet | ISE | JTAG | Linux | LVDS | Microblaze | ML310 | Modelsim | NIOS | OPB | PCI | Quartus | RocketIO | SDRAM | Spartan | Spartan3 | SRAM | Stratix | Verilog | VHDL | Virtex | Virtex-4 | Virtex-II | Xilinx | XST

Ads

See Also

DSPEmbedded SystemsElectronics

Comp.Arch.FPGA | What is the basis on flip-flops replaced by a latch

There are 22 messages in this thread.

You are currently looking at messages 0 to 10.

What is the basis on flip-flops replaced by a latch - Weng Tianxiang - 2010-02-11 15:05:00

Hi,
I finally understand the reason when a flip-flops can be replaced by a
latch.

Here is the excerpt from the paper "Atom Processor Core Made FPGA
Synthesizable"
Optimized for a frequency range from 800MHz to 1.86Ghz,
the original Atom design makes extensive use of latches
to support time borrowing along the critical timing paths.
With level-sensitive latches, a signal may have a delay larger
than the clock period and may flush through the latches
without causing incorrect data propagation, whereas the delay
of a signal in designs with edge-triggered flip-flops must
be smaller than the clock period to ensure the correctness of
data propagation across flip-flop stages [3]. It is well known
that the static timing analysis of latch-based pipeline designs
with level-sensitive latches is challenging due to two
salient characteristics of time borrowing [2, 3, 14]: (1) a
delay in one pipeline stage depends on the delays in the previous
pipeline stage. (2) in a pipeline design, not only do
the longest and shortest delays from a primary input to a
primary output need to be propagated through the pipeline
stages, but also the critical probabilities that the delays on
latches violate setup-time and hold-time constraints. Such
high dependency across the pipeline stages makes it very
difficult to gauge the impact of correlations among delay
random variables, especially the correlations resulting from
reconvergent fanouts. Due to this innate difficulty, synthesis
tools like DC-FPGA simply do not support latch analysis
and synthesis correctly."

In short, a pipeline with several FFs can be replaced with a pipeline
with two FFs in the ends and normal latches inserted between them to
steal time slack.

FF1 ---> FF2 ---> FF3 ---> FF4
FF1 ------->l2 --------> l3--> FF4.

I saw the circuits before, but not realized what the basic reason was.
With the above paper, I now know that the technology is not a new, it
originated in 1980s.

Weng




Re: What is the basis on flip-flops replaced by a latch - Patrick Maupin - 2010-02-11 20:33:00

Yes, latch-based design is much older than
flop-based design, for the
simple reason that it can be cheaper.  Think about it -- every flop is
really two latches!  (At least for static designs that can be clocked
down to DC...)  Where I work (at a chip company), we're still
occasionally converting latch-based designs into flop-based ones.

But (and this is a big but) FPGAs themselves (not just the design
tools) are designed for flop-based design, so if you use latch-based
designs with FPGAs you are not only stressing the timing tools, you
are also avoiding the nice, packaged, back-to-back dedicated latches
they give you called flops.

Pat


On Feb 11, 2:05=A0pm, Weng Tianxiang <wtx...@gmail.com> wrote:
> Hi,
> I finally understand the reason when a flip-flops can be replaced by a
> latch.
>
> Here is the excerpt from the paper "Atom Processor Core Made FPGA
> Synthesizable"
> Optimized for a frequency range from 800MHz to 1.86Ghz,
> the original Atom design makes extensive use of latches
> to support time borrowing along the critical timing paths.
> With level-sensitive latches, a signal may have a delay larger
> than the clock period and may flush through the latches
> without causing incorrect data propagation, whereas the delay
> of a signal in designs with edge-triggered flip-flops must
> be smaller than the clock period to ensure the correctness of
> data propagation across flip-flop stages [3]. It is well known
> that the static timing analysis of latch-based pipeline designs
> with level-sensitive latches is challenging due to two
> salient characteristics of time borrowing [2, 3, 14]: (1) a
> delay in one pipeline stage depends on the delays in the previous
> pipeline stage. (2) in a pipeline design, not only do
> the longest and shortest delays from a primary input to a
> primary output need to be propagated through the pipeline
> stages, but also the critical probabilities that the delays on
> latches violate setup-time and hold-time constraints. Such
> high dependency across the pipeline stages makes it very
> difficult to gauge the impact of correlations among delay
> random variables, especially the correlations resulting from
> reconvergent fanouts. Due to this innate difficulty, synthesis
> tools like DC-FPGA simply do not support latch analysis
> and synthesis correctly."
>
> In short, a pipeline with several FFs can be replaced with a pipeline
> with two FFs in the ends and normal latches inserted between them to
> steal time slack.
>
> FF1 ---> FF2 ---> FF3 ---> FF4
> FF1 ------->l2 --------> l3--> FF4.
>
> I saw the circuits before, but not realized what the basic reason was.
> With the above paper, I now know that the technology is not a new, it
> originated in 1980s.
>
> Weng

______________________________
Join the blogging team on FPGARelated.com and earn rewards! Details Here.

Re: What is the basis on flip-flops replaced by a latch - glen herrmannsfeldt - 2010-02-11 21:33:00

In comp.arch.fpga Patrick Maupin
<p...@gmail.com> wrote:

> Yes, latch-based design is much older than flop-based design, for the
> simple reason that it can be cheaper.  Think about it -- every flop is
> really two latches!  (At least for static designs that can be clocked
> down to DC...)  Where I work (at a chip company), we're still
> occasionally converting latch-based designs into flop-based ones.

Often using a two (or more) phase clock.  Some latches work on 
one phase, some on the other.  With appropriately non-overlapping,
one avoids race conditions and the timing isn't so hard to get right.

> But (and this is a big but) FPGAs themselves (not just the design
> tools) are designed for flop-based design, so if you use latch-based
> designs with FPGAs you are not only stressing the timing tools, you
> are also avoiding the nice, packaged, back-to-back dedicated latches
> they give you called flops.

Well, you could use a sequence of FF's, clocking on different clock
edges, or the same edge of two clocks.  

That allows for some of the advantages.  If there was enough demand,
I suppose FPGA companies would build transparent latch based devices.
(Who remembers the 7475?)

In pipelined processors of years past the Earle latch combined one
level of logic with the latch logic, reducing the latch delay.

-- glen
______________________________
Join the blogging team on FPGARelated.com and earn rewards! Details Here.

Re: What is the basis on flip-flops replaced by a latch - rickman - 2010-02-12 11:32:00

On Feb 11, 3:05 pm, Weng Tianxiang
<wtx...@gmail.com> wrote:
> Hi,
> I finally understand the reason when a flip-flops can be replaced by a
> latch.
>
> Here is the excerpt from the paper "Atom Processor Core Made FPGA
> Synthesizable"
> Optimized for a frequency range from 800MHz to 1.86Ghz,
> the original Atom design makes extensive use of latches
> to support time borrowing along the critical timing paths.
> With level-sensitive latches, a signal may have a delay larger
> than the clock period and may flush through the latches
> without causing incorrect data propagation, whereas the delay
> of a signal in designs with edge-triggered flip-flops must
> be smaller than the clock period to ensure the correctness of
> data propagation across flip-flop stages [3]. It is well known
> that the static timing analysis of latch-based pipeline designs
> with level-sensitive latches is challenging due to two
> salient characteristics of time borrowing [2, 3, 14]: (1) a
> delay in one pipeline stage depends on the delays in the previous
> pipeline stage. (2) in a pipeline design, not only do
> the longest and shortest delays from a primary input to a
> primary output need to be propagated through the pipeline
> stages, but also the critical probabilities that the delays on
> latches violate setup-time and hold-time constraints. Such
> high dependency across the pipeline stages makes it very
> difficult to gauge the impact of correlations among delay
> random variables, especially the correlations resulting from
> reconvergent fanouts. Due to this innate difficulty, synthesis
> tools like DC-FPGA simply do not support latch analysis
> and synthesis correctly."
>
> In short, a pipeline with several FFs can be replaced with a pipeline
> with two FFs in the ends and normal latches inserted between them to
> steal time slack.
>
> FF1 ---> FF2 ---> FF3 ---> FF4
> FF1 ------->l2 --------> l3--> FF4.
>
> I saw the circuits before, but not realized what the basic reason was.
> With the above paper, I now know that the technology is not a new, it
> originated in 1980s.
>
> Weng

I'm a little unclear on how this works.  Is this just a matter of the
outputs of the latches settling earlier if the logic path is faster so
that the next stage actually has more setup time?  This requires that
there be a minimum delay in any given path so that the correct data is
latched on the current clock cycle while the result for the next clock
cycle is still propagating through the logic.  I can see where this
might be helpful, but it would be a nightmare to analyze in timing,
mainly because of the wide range of delays with process, voltage and
temperature (PVT).  I have been told you need to allow 2:1 range when
considering all three.

I think similar issues are involved when considering async design (or
more accurately termed self-timed).  In that design method the
variations in delay affect the timing of both the data path and clock
path so that they are largely nulled out so that the min delays do not
need to include the full 2:1 range compared to the max.  Some amount
of slack time must be given so the clock arrives after the data, but
otherwise all the speed of the logic is utilized at all times.  This
also is supposed to provide for lower noise designs because there is
no chip wide clock giving rise to simultaneous switching noise.  Self-
timed logic does not really result in significant increases in
processing speed because although the max speed can be faster, an
application can never rely on that faster speed being available.  But
for applications where there is optional processing that can be done
using the left over clock cycles (poor term in this case, but you know
what I mean) it can be useful.

In the case of using latches in place of registers, the speed gains
are always usable.  But can't the same sort of gains be made by
register leveling?  If you have logic that is slower than a clock
cycle followed by logic that is faster than a clock cycle, why not
just move some of the slow logic across the register to the faster
logic section?

Rick
______________________________
Join the blogging team on FPGARelated.com and earn rewards! Details Here.

Re: What is the basis on flip-flops replaced by a latch - Patrick Maupin - 2010-02-12 22:26:00

On Feb 11, 8:33=A0pm, glen herrmannsfeldt
<g...@ugcs.caltech.edu> wrote:
> In comp.arch.fpga Patrick Maupin <pmau...@gmail.com> wrote:
>
> > But (and this is a big but) FPGAs themselves (not just the design
> > tools) are designed for flop-based design, so if you use latch-based
> > designs with FPGAs you are not only stressing the timing tools, you
> > are also avoiding the nice, packaged, back-to-back dedicated latches
> > they give you called flops.
>
> Well, you could use a sequence of FF's, clocking on different clock
> edges, or the same edge of two clocks. =A0
>

I actually did this in Xilinx FPGAs back in 1999.  The specific
problem I was solving was an insufficient number of global clocks (a
lot of interconnects with source-based clocking).  Xilinx has
solutions for this now (regional clocks), but not back then.  So I
used regular interconnect for clocking, and that was very high skew,
so that you couldn't guarantee that the same edge was, in fact, the
same edge for all the flops on the clock.

The solution was to do as you said -- the inputs to every flop were
from flops clocked on the opposite edge.  That, and reducing the
amount of logic in that clock domain and clock-crossing to a "real"
clock domain as soon as possible.

Re: What is the basis on flip-flops replaced by a latch - Patrick Maupin - 2010-02-12 22:35:00

On Feb 12, 10:32=A0am, rickman
<gnu...@gmail.com> wrote:

> In the case of using latches in place of registers, the speed gains
> are always usable. =A0But can't the same sort of gains be made by
> register leveling? =A0If you have logic that is slower than a clock
> cycle followed by logic that is faster than a clock cycle, why not
> just move some of the slow logic across the register to the faster
> logic section?

That's a similar technique, to be sure, for speed-gains.  But as I
wrote in an earlier post, I think the primary motivation for latch-
based design was originally cost.  For example, since each flop is
really two latches, if you are going to have logic which ANDs together
the output of two flops, you could replace that with ANDing the output
of two latches, and outputting that result through another latch, for
a net savings of 75% of the latches.

______________________________
Join the blogging team on FPGARelated.com and earn rewards! Details Here.

Re: What is the basis on flip-flops replaced by a latch - Weng Tianxiang - 2010-02-13 02:01:00

On Feb 12, 7:35=A0pm, Patrick Maupin
<pmau...@gmail.com> wrote:
> On Feb 12, 10:32=A0am, rickman <gnu...@gmail.com> wrote:
>
> > In the case of using latches in place of registers, the speed gains
> > are always usable. =A0But can't the same sort of gains be made by
> > register leveling? =A0If you have logic that is slower than a clock
> > cycle followed by logic that is faster than a clock cycle, why not
> > just move some of the slow logic across the register to the faster
> > logic section?
>
> That's a similar technique, to be sure, for speed-gains. =A0But as I
> wrote in an earlier post, I think the primary motivation for latch-
> based design was originally cost. =A0For example, since each flop is
> really two latches, if you are going to have logic which ANDs together
> the output of two flops, you could replace that with ANDing the output
> of two latches, and outputting that result through another latch, for
> a net savings of 75% of the latches.

Your method's target and the target used by CPU designers inserting
latches in the pipeline line are totally different.

They use it because a combinational signal time delay is tool long to
fit within one clock cycle and too short within two clock cycles in a
pipeline, not in any places you may want to.

Weng

Re: What is the basis on flip-flops replaced by a latch - John_H - 2010-02-13 09:00:00

On Feb 12, 11:32=A0am, rickman
<gnu...@gmail.com> wrote:
<snip>
>
> In the case of using latches in place of registers, the speed gains
> are always usable. =A0But can't the same sort of gains be made by
> register leveling? =A0If you have logic that is slower than a clock
> cycle followed by logic that is faster than a clock cycle, why not
> just move some of the slow logic across the register to the faster
> logic section?
>
> Rick

I argued with my coworker for a few days about the benefit of latches
versus registers before I finally realized the advantage of latch
based designs.  Not only is granularity less of a problem (e.g., only
able to fit 2 logic delays in a level rather than the maximum 2.8
available, losing nearly 30%) but synchronous delays are different.
Rather than accounting for Tco+Tsu for every register in a chain of a
few clock cycles where register leveling is helpful, only the Tito
transparent latch delay (minus the Tilo LUT delay) needs to be added
for each latch in the chain [using Xilinx timing nomenclature].

I agree that the register based FPGAs are probably designed (and
tested) to minimize Tsu and Tco without strong consideration for Tito
and that the timing analysis is NOT set up to do a good job with
"latch leveled" timing analysis.

When I do use latches (when transferring data between rising/falling
time domains for a fast clock, for instance) I have to specify false
values around the latch for synchronous analysis rather than the
precise values through the latch because the analysis wants to see
registers at each stage even with the proper analysis flag turned on.
If the analyzer would recognize a chain of rise/fall/rise/fall
controlled latches and automatically increase the timing constraint by
a half period for each stage, we'd potentially have a powerful tool at
our disposal.  But they don't so we don't.  At least not in FPGAs.

- John_H

Re: What is the basis on flip-flops replaced by a latch - glen herrmannsfeldt - 2010-02-13 15:09:00

In comp.arch.fpga John_H
<n...@johnhandwork.com> wrote:
(snip)
 
> I argued with my coworker for a few days about the benefit of latches
> versus registers before I finally realized the advantage of latch
> based designs.  Not only is granularity less of a problem (e.g., only
> able to fit 2 logic delays in a level rather than the maximum 2.8
> available, losing nearly 30%) but synchronous delays are different.
> Rather than accounting for Tco+Tsu for every register in a chain of a
> few clock cycles where register leveling is helpful, only the Tito
> transparent latch delay (minus the Tilo LUT delay) needs to be added
> for each latch in the chain [using Xilinx timing nomenclature].

I would have thought that they were fast enough now for that
not to matter so much.  My thought would be that clock skew,
even with the fancy clock distribution system, would be the important
factor.  

If the granularity is the problem then you might try clocking
some on rising and some on falling edge (if available) or having
two clocks with known phase difference.  That would be especially
true if the DLL's could generate the appropriate clocks.
 
> I agree that the register based FPGAs are probably designed (and
> tested) to minimize Tsu and Tco without strong consideration for Tito
> and that the timing analysis is NOT set up to do a good job with
> "latch leveled" timing analysis.
 
> When I do use latches (when transferring data between rising/falling
> time domains for a fast clock, for instance) I have to specify false
> values around the latch for synchronous analysis rather than the
> precise values through the latch because the analysis wants to see
> registers at each stage even with the proper analysis flag turned on.
> If the analyzer would recognize a chain of rise/fall/rise/fall
> controlled latches and automatically increase the timing constraint by
> a half period for each stage, we'd potentially have a powerful tool at
> our disposal.  But they don't so we don't.  At least not in FPGAs.

That sounds useful.  If it gets popular enough, maybe they
will add it.

-- glen

Re: What is the basis on flip-flops replaced by a latch - John_H - 2010-02-13 19:21:00

On Feb 13, 3:09=A0pm, glen herrmannsfeldt
<g...@ugcs.caltech.edu> wrote:
<snip>
> > Rather than accounting for Tco+Tsu for every register in a chain of a
> > few clock cycles where register leveling is helpful, only the Tito
> > transparent latch delay (minus the Tilo LUT delay) needs to be added
> > for each latch in the chain [using Xilinx timing nomenclature].
>
> I would have thought that they were fast enough now for that
> not to matter so much. =A0My thought would be that clock skew,
> even with the fancy clock distribution system, would be the important
> factor.

Clock skew becomes entirely unimportant in the latch scheme as I know
it unless CLK and CLK180 are used instead of normal and inverted
versions of the same clock.  The latches are explicitly alternated
posedge/negedge/posedge/negedge effectively decomposing a conceptual
register into its two latches and balancing the logic between them.
For clock skew to be an issue, two consecutive latches would have to
be transparent long enough for the logic path plus delays to sneak
through; that won't happen when using the normal and invert of the
*same* clock net unless things are very, very wrong in the latch
design.

> If the granularity is the problem then you might try clocking
> some on rising and some on falling edge (if available) or having
> two clocks with known phase difference. =A0That would be especially
> true if the DLL's could generate the appropriate clocks.

Some... registers?  Using the posedge and negedge in a registered
arrangement would simply exacerbate the granularity problem, able to
fit fewer whole delays into the same clock period by dividing the
logic into two phases.  The latches allow longer delays to move the
valid data further toward the end of the transparent window and
shorter delays to move it back, always with the safeguard that data
for the next (half) cycle isn't allowed to be valid any sooner than
the front edge of the transparent window.

The description comes out a little muddy which is why it took me a few
days to buy in to the whole concept.  It's sweet!  It just takes some
timing diagrams and head scratching.  And it's certainly not set up
for proper analysis especially in the Xilinx tools where I
experimented with the phase domain changes.

- John_H

| 1 | 2 | 3 | next