> That implies that the minimum prop delay of the multiplier is
> guaranteed to be more than 1/2 clock period.  Probably also a
> good bet in 1976.  In any case this doesn't represent a pipe
> stage for 1/2 clock but rather for 1 1/2 clocks.
> 
Yes, it depends on mintimes so it's a poor design technique and would probably stop working when you shrink the die.

On Monday, 5/15/2017 2:29 PM, Kevin Neilson wrote:
>> Was I missing something or is this ever used to advantage?
> 
> I imagine it was used to transfer slack from one stage to another.  Imagine it's 1976, and you have everything laid out, but then you find that you have some stage with negative slack (let's say a multiplier) followed by a stage with positive slack (let's say a mux).  It's hard to move registers back into the multiplier, partly because it would increase the number of FFs, and partly because it's 1976 and you'd have to re-tape everything.  So you just have the mux grab the data on the falling clock edge, transferring half a period of slack from the mux to the multiplier so the multiplier has 1.5 cycles and the mux has 0.5.  Something like that.
> 

That implies that the minimum prop delay of the multiplier is
guaranteed to be more than 1/2 clock period.  Probably also a
good bet in 1976.  In any case this doesn't represent a pipe
stage for 1/2 clock but rather for 1 1/2 clocks.

-- 
Gabor

On Saturday, May 13, 2017 at 5:52:37 PM UTC-4, rickman wrote:
> I recall a processor implementation where the guy tried to say that one 
> particular part of the pipeline design had a register inserted which was 
> clocked on the negative edge.  I could never see how this would 
> positively impact anything.  In fact, the setup and hold time of the 
> register, not to mention the routing time, would add to the delay in 
> that pipeline stage.
> 
> Was I missing something or is this ever used to advantage?

I don't know if you have seen this before, but something similar is
described in the book, "But How Do It Know?" by J. Scott Clark:

    https://www.amazon.com/But-How-Know-Principles-Computers/dp/0615303765

Someone made a video describing how it is useful for certain types
of slow-clock CPUs:

    https://www.youtube.com/watch?v=cNN_tTXABUA

If you look, the computation takes place nearer to the positive edge,
and the write operating takes place nearer to the negative edge, so
that enough time takes place in-between to conduct the workload.

I've seen several designs which trigger in this way.  There are also
several methods described in (I believe) Lattice documentation, which
shows how to merge multiple clock signals together to obtain a clock
signal that will dwell fire around the negative edge, and dwell fire
around the positive edge for various purposes.

Thank you,
Rick C. Hodgin

> Was I missing something or is this ever used to advantage?

I imagine it was used to transfer slack from one stage to another.  Imagine=
 it's 1976, and you have everything laid out, but then you find that you ha=
ve some stage with negative slack (let's say a multiplier) followed by a st=
age with positive slack (let's say a mux).  It's hard to move registers bac=
k into the multiplier, partly because it would increase the number of FFs, =
and partly because it's 1976 and you'd have to re-tape everything.  So you =
just have the mux grab the data on the falling clock edge, transferring hal=
f a period of slack from the mux to the multiplier so the multiplier has 1.=
5 cycles and the mux has 0.5.  Something like that.

On Saturday, May 13, 2017 at 6:52:37 PM UTC-3, rickman wrote:
> I recall a processor implementation where the guy tried to say that one 
> particular part of the pipeline design had a register inserted which was 
> clocked on the negative edge.  I could never see how this would 
> positively impact anything.  In fact, the setup and hold time of the 
> register, not to mention the routing time, would add to the delay in 
> that pipeline stage.

Sometimes you want a pipeline stage to work in a different clock phase from other stages. This is sometimes done to fit the write-back stage and the op fetch stage in the same clock cycle. Another example was the original MIPS 2000 and how it used the same pins for both the instruction and data caches by using a different phase for the fetch pipeline stage.

And while it is something different, see how the three stage ARM Cortex M0+ pipeline is made to look like a two stage pipeline:

http://microchipdeveloper.com/32arm:m0-pipeline

The alternative is to use a clock with twice the frequency and have enables that make some stages work on even clocks and others on odd ones.

-- Jecel

Den l&oslash;rdag den 13. maj 2017 kl. 23.52.37 UTC+2 skrev rickman:
> I recall a processor implementation where the guy tried to say that one 
> particular part of the pipeline design had a register inserted which was 
> clocked on the negative edge.  I could never see how this would 
> positively impact anything.  In fact, the setup and hold time of the 
> register, not to mention the routing time, would add to the delay in 
> that pipeline stage.
> 
> Was I missing something or is this ever used to advantage?
> 

I guess there could be some way that the logic going to and from that register
is fast enough that it would be possible to get and extra cycle for free

On 5/14/2017 4:14 PM, Gabor wrote:
> On Saturday, 5/13/2017 5:52 PM, rickman wrote:
>> I recall a processor implementation where the guy tried to say that
>> one particular part of the pipeline design had a register inserted
>> which was clocked on the negative edge.  I could never see how this
>> would positively impact anything.  In fact, the setup and hold time of
>> the register, not to mention the routing time, would add to the delay
>> in that pipeline stage.
>>
>> Was I missing something or is this ever used to advantage?
>>
>
> Opposite edge pipe registers can be useful if your clock distribution
> scheme is not able to guarantee the required hold time.  I've used
> this in early Xilinx parts that had only 4 internal clock buffers
> and I needed to bring in more (relatively slow) inputs using an
> additional clock.  In those parts you could use "low skew nets" to
> route a clock, but even then you'd have hold time issues.  In that
> particular design everything on the poorly routed clocks went back
> and forth between clock edges.  That included things like counters,
> which would typically use a single N-wide register and feedback from
> their own outputs.  Instead I needed two N-wide registers (one on
> each clock) to remove hold time in the feedback paths.  Obviously
> this would be painful to do a whole design in, but for me it worked
> enough to get the data into distributed RAM for transfer to one of
> the internal global clock domains.

This is an issue of poor clock distribution.  The guy using the opposite 
edge registers was saying it added a pipeline stage the same as the 
positive edge registers.  Even if this was done for all logic on all 
stages it would not be the same as adding more positive edge registers 
because it doesn't speed up the clock.  In fact the added setup and hold 
time of the added register slows down the circuit.

-- 

Rick C

On Saturday, 5/13/2017 5:52 PM, rickman wrote:
> I recall a processor implementation where the guy tried to say that one 
> particular part of the pipeline design had a register inserted which was 
> clocked on the negative edge.  I could never see how this would 
> positively impact anything.  In fact, the setup and hold time of the 
> register, not to mention the routing time, would add to the delay in 
> that pipeline stage.
> 
> Was I missing something or is this ever used to advantage?
> 

Opposite edge pipe registers can be useful if your clock distribution
scheme is not able to guarantee the required hold time.  I've used
this in early Xilinx parts that had only 4 internal clock buffers
and I needed to bring in more (relatively slow) inputs using an
additional clock.  In those parts you could use "low skew nets" to
route a clock, but even then you'd have hold time issues.  In that
particular design everything on the poorly routed clocks went back
and forth between clock edges.  That included things like counters,
which would typically use a single N-wide register and feedback from
their own outputs.  Instead I needed two N-wide registers (one on
each clock) to remove hold time in the feedback paths.  Obviously
this would be painful to do a whole design in, but for me it worked
enough to get the data into distributed RAM for transfer to one of
the internal global clock domains.

-- 
Gabor

I recall a processor implementation where the guy tried to say that one 
particular part of the pipeline design had a register inserted which was 
clocked on the negative edge.  I could never see how this would 
positively impact anything.  In fact, the setup and hold time of the 
register, not to mention the routing time, would add to the delay in 
that pipeline stage.

Was I missing something or is this ever used to advantage?

-- 

Rick C