FPGARelated.com
Forums

Pipelining on Multiple Clock Edges

Started by rickman May 13, 2017
I recall a processor implementation where the guy tried to say that one 
particular part of the pipeline design had a register inserted which was 
clocked on the negative edge.  I could never see how this would 
positively impact anything.  In fact, the setup and hold time of the 
register, not to mention the routing time, would add to the delay in 
that pipeline stage.

Was I missing something or is this ever used to advantage?

-- 

Rick C
On Saturday, 5/13/2017 5:52 PM, rickman wrote:
> I recall a processor implementation where the guy tried to say that one > particular part of the pipeline design had a register inserted which was > clocked on the negative edge. I could never see how this would > positively impact anything. In fact, the setup and hold time of the > register, not to mention the routing time, would add to the delay in > that pipeline stage. > > Was I missing something or is this ever used to advantage? >
Opposite edge pipe registers can be useful if your clock distribution scheme is not able to guarantee the required hold time. I've used this in early Xilinx parts that had only 4 internal clock buffers and I needed to bring in more (relatively slow) inputs using an additional clock. In those parts you could use "low skew nets" to route a clock, but even then you'd have hold time issues. In that particular design everything on the poorly routed clocks went back and forth between clock edges. That included things like counters, which would typically use a single N-wide register and feedback from their own outputs. Instead I needed two N-wide registers (one on each clock) to remove hold time in the feedback paths. Obviously this would be painful to do a whole design in, but for me it worked enough to get the data into distributed RAM for transfer to one of the internal global clock domains. -- Gabor
On 5/14/2017 4:14 PM, Gabor wrote:
> On Saturday, 5/13/2017 5:52 PM, rickman wrote: >> I recall a processor implementation where the guy tried to say that >> one particular part of the pipeline design had a register inserted >> which was clocked on the negative edge. I could never see how this >> would positively impact anything. In fact, the setup and hold time of >> the register, not to mention the routing time, would add to the delay >> in that pipeline stage. >> >> Was I missing something or is this ever used to advantage? >> > > Opposite edge pipe registers can be useful if your clock distribution > scheme is not able to guarantee the required hold time. I've used > this in early Xilinx parts that had only 4 internal clock buffers > and I needed to bring in more (relatively slow) inputs using an > additional clock. In those parts you could use "low skew nets" to > route a clock, but even then you'd have hold time issues. In that > particular design everything on the poorly routed clocks went back > and forth between clock edges. That included things like counters, > which would typically use a single N-wide register and feedback from > their own outputs. Instead I needed two N-wide registers (one on > each clock) to remove hold time in the feedback paths. Obviously > this would be painful to do a whole design in, but for me it worked > enough to get the data into distributed RAM for transfer to one of > the internal global clock domains.
This is an issue of poor clock distribution. The guy using the opposite edge registers was saying it added a pipeline stage the same as the positive edge registers. Even if this was done for all logic on all stages it would not be the same as adding more positive edge registers because it doesn't speed up the clock. In fact the added setup and hold time of the added register slows down the circuit. -- Rick C
Den lørdag den 13. maj 2017 kl. 23.52.37 UTC+2 skrev rickman:
> I recall a processor implementation where the guy tried to say that one > particular part of the pipeline design had a register inserted which was > clocked on the negative edge. I could never see how this would > positively impact anything. In fact, the setup and hold time of the > register, not to mention the routing time, would add to the delay in > that pipeline stage. > > Was I missing something or is this ever used to advantage? >
I guess there could be some way that the logic going to and from that register is fast enough that it would be possible to get and extra cycle for free
On Saturday, May 13, 2017 at 6:52:37 PM UTC-3, rickman wrote:
> I recall a processor implementation where the guy tried to say that one > particular part of the pipeline design had a register inserted which was > clocked on the negative edge. I could never see how this would > positively impact anything. In fact, the setup and hold time of the > register, not to mention the routing time, would add to the delay in > that pipeline stage.
Sometimes you want a pipeline stage to work in a different clock phase from other stages. This is sometimes done to fit the write-back stage and the op fetch stage in the same clock cycle. Another example was the original MIPS 2000 and how it used the same pins for both the instruction and data caches by using a different phase for the fetch pipeline stage. And while it is something different, see how the three stage ARM Cortex M0+ pipeline is made to look like a two stage pipeline: http://microchipdeveloper.com/32arm:m0-pipeline The alternative is to use a clock with twice the frequency and have enables that make some stages work on even clocks and others on odd ones. -- Jecel
> Was I missing something or is this ever used to advantage?
I imagine it was used to transfer slack from one stage to another. Imagine= it's 1976, and you have everything laid out, but then you find that you ha= ve some stage with negative slack (let's say a multiplier) followed by a st= age with positive slack (let's say a mux). It's hard to move registers bac= k into the multiplier, partly because it would increase the number of FFs, = and partly because it's 1976 and you'd have to re-tape everything. So you = just have the mux grab the data on the falling clock edge, transferring hal= f a period of slack from the mux to the multiplier so the multiplier has 1.= 5 cycles and the mux has 0.5. Something like that.
On Saturday, May 13, 2017 at 5:52:37 PM UTC-4, rickman wrote:
> I recall a processor implementation where the guy tried to say that one > particular part of the pipeline design had a register inserted which was > clocked on the negative edge. I could never see how this would > positively impact anything. In fact, the setup and hold time of the > register, not to mention the routing time, would add to the delay in > that pipeline stage. > > Was I missing something or is this ever used to advantage?
I don't know if you have seen this before, but something similar is described in the book, "But How Do It Know?" by J. Scott Clark: https://www.amazon.com/But-How-Know-Principles-Computers/dp/0615303765 Someone made a video describing how it is useful for certain types of slow-clock CPUs: https://www.youtube.com/watch?v=cNN_tTXABUA If you look, the computation takes place nearer to the positive edge, and the write operating takes place nearer to the negative edge, so that enough time takes place in-between to conduct the workload. I've seen several designs which trigger in this way. There are also several methods described in (I believe) Lattice documentation, which shows how to merge multiple clock signals together to obtain a clock signal that will dwell fire around the negative edge, and dwell fire around the positive edge for various purposes. Thank you, Rick C. Hodgin
On Monday, 5/15/2017 2:29 PM, Kevin Neilson wrote:
>> Was I missing something or is this ever used to advantage? > > I imagine it was used to transfer slack from one stage to another. Imagine it's 1976, and you have everything laid out, but then you find that you have some stage with negative slack (let's say a multiplier) followed by a stage with positive slack (let's say a mux). It's hard to move registers back into the multiplier, partly because it would increase the number of FFs, and partly because it's 1976 and you'd have to re-tape everything. So you just have the mux grab the data on the falling clock edge, transferring half a period of slack from the mux to the multiplier so the multiplier has 1.5 cycles and the mux has 0.5. Something like that. >
That implies that the minimum prop delay of the multiplier is guaranteed to be more than 1/2 clock period. Probably also a good bet in 1976. In any case this doesn't represent a pipe stage for 1/2 clock but rather for 1 1/2 clocks. -- Gabor
> That implies that the minimum prop delay of the multiplier is > guaranteed to be more than 1/2 clock period. Probably also a > good bet in 1976. In any case this doesn't represent a pipe > stage for 1/2 clock but rather for 1 1/2 clocks. >
Yes, it depends on mintimes so it's a poor design technique and would probably stop working when you shrink the die.