FPGARelated.com
Forums

Do two clock system blocks with one clock running half of other's need asynchronous input/output buffers?

Started by Weng Tianxiang October 4, 2008
Hi,
I have no such experiences and ask for the question answers:

is it possible that two clock system blocks with one clock running
half of other's of same clock source don't need asynchronous input/
output buffers in best circuit and logic design?

Especially, for example Intel and AMD CPU chip's, their cache I runs
half frequency of CPU clock and gets almost 1/2 data rate as documents
show.

What is their designs standout? I know Xilinx chip has divided clock
outputs in addition to the main clock output and never have such an
experiences to use the technique.

I need a guidance and direction instructions on the subject. A book or
a paper reference is preferred.

Thank you.

Weng
>Hi, >I have no such experiences and ask for the question answers: > >is it possible that two clock system blocks with one clock running >half of other's of same clock source don't need asynchronous input/ >output buffers in best circuit and logic design? > >Especially, for example Intel and AMD CPU chip's, their cache I runs >half frequency of CPU clock and gets almost 1/2 data rate as documents >show. > >What is their designs standout? I know Xilinx chip has divided clock >outputs in addition to the main clock output and never have such an >experiences to use the technique. > >I need a guidance and direction instructions on the subject. A book or >a paper reference is preferred. > >Thank you. > >Weng >
Hi Weng, I am not sure about your design platform but from FPGA perspective: If the two clocks are locked(frequency and phase), then you can consider them synchronised assuming we trust the generating source. In this case there is no need to make extra efforts to cross domains. The issue you should be aware of is that they might not always be in phase as assumed and in this case any phase-sensitive logic may occasionally fail. For example a pulse generated in the fast domain failing to be seen by the edge of slow clock, this commonly leads to power-up problems. If the clocks are not in phase "by design, for some reason" then your compiler should tell you if there is any setup or hold violations. If there is violation I will consider them asynchronous. For asynchronous clocks inside FPGAs, I normally use dual clock fifos for main crossing areas. Alternatively, you can make your crossing plans based on double register synchronisation and correct data transfer If your clocks are external(between chips) - as I understand from your description - then this is a different matter. Board delay differences are inevitable. All I can say is that they are asynchronous. So you better cross domains with care or lock them together(e.g. inside an FPGA but this requires costly loop design). Remember a phase lock loop uses phase difference to lock two frequencies but this doesn't usually mean they are locked with respect to absolute phase unless extra design effort is added. Kadhiem
On Oct 4, 10:25=A0am, "kadhiem_ayob" <kadhiem_a...@yahoo.co.uk> wrote:
> >Hi, > >I have no such experiences and ask for the question answers: > > >is it possible that two clock system blocks with one clock running > >half of other's of same clock source don't need asynchronous input/ > >output buffers in best circuit and logic design? > > >Especially, for example Intel and AMD CPU chip's, their cache I runs > >half frequency of CPU clock and gets almost 1/2 data rate as documents > >show. > > >What is their designs standout? I know Xilinx chip has divided clock > >outputs in addition to the main clock output and never have such an > >experiences to use the technique. > > >I need a guidance and direction instructions on the subject. A book or > >a paper reference is preferred. > > >Thank you. > > >Weng > > Hi Weng, > > I am not sure about your design platform but from FPGA perspective: > If the two clocks are locked(frequency and phase), then you can consider > them synchronised assuming we trust the generating source. In this case > there is no need to make extra efforts to cross domains. The issue you > should be aware of is that they might not always be in phase as assumed a=
nd
> in this case any phase-sensitive logic may occasionally fail. For example=
a
> pulse generated in the fast domain failing to be seen by the edge of slow > clock, this commonly leads to power-up problems. If the clocks are not in > phase "by design, for some reason" then your compiler should tell you if > there is any setup or hold violations. If there is violation I will > consider them asynchronous. > > For asynchronous clocks inside FPGAs, I normally use dual clock fifos for > main crossing areas. Alternatively, you can make your crossing plans base=
d
> on double register synchronisation and correct data transfer > > If your clocks are external(between chips) - as I understand from your > description - then this is a different matter. Board delay differences ar=
e
> inevitable. All I can say is that they are asynchronous. So you better > cross domains with care or lock them together(e.g. inside an FPGA but thi=
s
> requires costly loop design). > > Remember a phase lock loop uses phase difference to lock two frequencies > but this doesn't usually mean they are locked with respect to absolute > phase unless extra design effort is added. > > Kadhiem =A0- Hide quoted text - > > - Show quoted text -
Hi Kadhiem, thank you for your response. I am learning Intel 82496/82491 cache II controller chip and cache II SRAM chip running at 66 MHz. the book was published in 1994. I want to learn how they design 3 chips with same cycles from same clock source in the board, including Pentium processor. In the book, it doesn't mention asynchronous input/output FIFO are used. Now cache II chip controller and cache II SRAM are included in new multiprocessor. I am wondering the question: how they design the multiprocessor chip: From Intel documents, 4 processors run at 2GHz or so and all their cache I controller and cache I SRAM run at half rate (1 data rate per cycle for core and 1 data rate per 2 cycles for cache I.) If I were the designer, there might be two choices: 1. cache I controller and cache I SRAM run on clock which is main clock source divided by 2 with input/output asynchronous FIFO in the interface; 2. cache I controller and cache I SRAM run on clock which is main clock source NOT divided by 2 withOUT input/output asynchronous FIFO in the interface and with enable signal to run them at half rate. Option 1 is reliable, but has a performance penalty. It seems one cannot get the data rate with input/output asynchronous FIFO in the interface, based on my experiences. Option 2 is reliable too, but it has more energy usage, because its clock runs at double rate than option 1. But it guarantees one data per 2 cycles for cache I. It seems to me that they must use option 2 instead of option 1. I would like experts' opinion. Even though it may be a lip work. Weng
On Oct 4, 9:09=A0am, Weng Tianxiang <wtx...@gmail.com> wrote:
> Hi, > I have no such experiences and ask for the question answers: > > is it possible that two clock system blocks with one clock running > half of other's of same clock source don't need asynchronous input/ > output buffers in best circuit and logic design? > > Especially, for example Intel and AMD CPU chip's, their cache I runs > half frequency of CPU clock and gets almost 1/2 data rate as documents > show. > > What is their designs standout? I know Xilinx chip has divided clock > outputs in addition to the main clock output and never have such an > experiences to use the technique. > > I need a guidance and direction instructions on the subject. A book or > a paper reference is preferred. > > Thank you. > > Weng
Weng, let me explain the basics: You want to drive a system with two clocks, one of them has half the frequency of the other. The important question is now: what is the phase relationship between the frequencies? Or, in simpler terms, assuming you use rising edge triggering of the flipflops and registers: What is the timing delay between rising edges of both clocks. If you are sure that there is no delay (which I would never really believe) then there is no problem. If, however there is a short systematic delay, where the rising edge of f2 is always a few ns later than the rising edge of f1, then any data transfer from f1-based to f2-based might be unreliable, because the f2 clock might pick up either the old data or the new data that had just been changed by f1. That's would be a race condition, or a hold-time violation. In the opposite direction, there is no problem, provided you still have enough set-up time available, after you lost some due to the phase difference. This all assumes that the phase relationship is known and stable. If it isn't, then you should treat the phase relationship as unknown and use asynchronous FIFOs or some handshaking. If your system is slow, you can deliberately offset the rising edges by half a period of the faster clock, which would give you well- defined timing relationship and clock margin (but you gave up half the potential speed) Peter Alfke, still there, lurking on weekends...
Hi Weng,

When I was struggling with metastability, Philip Freidin was good
enough to point out where I was going wrong. I think his explanation
was very clear and helpful - it's at http://tinyurl.com/473w92 if you
want to have a look...

Cheers,
     Simon (just giving back, and feeling good about it :)

On Oct 5, 6:48=A0pm, Peter Alfke <al...@sbcglobal.net> wrote:
> On Oct 4, 9:09=A0am, Weng Tianxiang <wtx...@gmail.com> wrote: > > > > > > > Hi, > > I have no such experiences and ask for the question answers: > > > is it possible that two clock system blocks with one clock running > > half of other's of same clock source don't need asynchronous input/ > > output buffers in best circuit and logic design? > > > Especially, for example Intel and AMD CPU chip's, their cache I runs > > half frequency of CPU clock and gets almost 1/2 data rate as documents > > show. > > > What is their designs standout? I know Xilinx chip has divided clock > > outputs in addition to the main clock output and never have such an > > experiences to use the technique. > > > I need a guidance and direction instructions on the subject. A book or > > a paper reference is preferred. > > > Thank you. > > > Weng > > Weng, let me explain the basics: > You want to drive a system with two clocks, one of them has half the > frequency of the other. > The important question is now: what is the phase relationship between > the frequencies? Or, in simpler terms, assuming you use rising edge > triggering of the flipflops and registers: What is the timing delay > between rising edges of both clocks. > If you are sure that there is no delay (which I would never really > believe) then there is no problem. > If, however there is a short systematic delay, where the rising edge > of f2 is always a few ns later than the rising edge of f1, then any > data transfer from f1-based to f2-based might be unreliable, because > the f2 clock might pick up either the old data or the new data that > had just been changed by f1. That's would be a race condition, or a > hold-time violation. In the opposite direction, there is no problem, > provided you still have enough set-up time available, after you lost > some due to the phase difference. > This all assumes that the phase relationship is known and stable. If > it isn't, then you should treat the phase relationship as unknown and > use asynchronous FIFOs or some handshaking. > If your system is slow, you can deliberately offset the rising edges > by half a period of the faster clock, which would give you well- > defined timing relationship and clock margin (but you gave up half the > potential speed) > Peter Alfke, still there, lurking on weekends...- Hide quoted text - > > - Show quoted text -
Hi Peter, Glad to receive your advice again. "This all assumes that the phase relationship is known and stable. If it isn't, then you should treat the phase relationship as unknown and use asynchronous FIFOs or some handshaking. " I fully agree with the above point. Weng
On Oct 5, 8:54=A0pm, Simon <goo...@gornall.net> wrote:
> Hi Weng, > > When I was struggling with metastability, Philip Freidin was good > enough to point out where I was going wrong. I think his explanation > was very clear and helpful - it's athttp://tinyurl.com/473w92if you > want to have a look... > > Cheers, > =A0 =A0 =A0Simon (just giving back, and feeling good about it :)
Hi Simon, I read the recommended comments and it is the standard asynchronous input/output handshaking. One does it and it must sacrify 4 clocks, 2 for output control signals, and 2 for input back control signals. Peter has a very nice paper describing the situations. But I don't think Intel 2GHz chip use the handshaking method, since every signals delay is at least 4 clocks, while their document is 2 clock for cache I part. So that I think Intel is using enable signal to control cache I part to get the full core and cache I as a synchronous system, not sacrifying any data delays. I have experiences designing a sucessful system with 3 clock rate in a Xilinx chip using global clock as only clock source and use asistant enable signals to control slow parts of design. For a high perfomance processor chip, 4 clock delays are unacceptable. Do you agree? Peter too? Thank you. Weng
Just speculating, but I'd bet large processors use derived clocks,
with known phase relationships, to avoid the typical asynchronous
clock boundary crossing logic wherever possible. Running large cache
structures at twice the clock frequency needed is too power hungry.
They might have been able to get away with it in the past, when caches
were smaller and power consumption was less important, but not any
longer.

Andy
On Oct 8, 10:04=A0am, Andy <jonesa...@comcast.net> wrote:
> Just speculating, but I'd bet large processors use derived clocks, > with known phase relationships, to avoid the typical asynchronous > clock boundary crossing logic wherever possible. Running large cache > structures at twice the clock frequency needed is too power hungry. > They might have been able to get away with it in the past, when caches > were smaller and power consumption was less important, but not any > longer. > > Andy
Hi Andy, Glad to hear you again. We are always on opposite sides of any coin. Peter misses the point and your response hits the point: "I'd bet large processors use derived clocks, with known phase relationships, to avoid the typical asynchronous clock boundary crossing logic wherever possible. " I disagree with you. How can they manage the huge range of temperatures that causes clock circiut shifting. Peter, what is your opinion? From Xilinx FPGA designer's point of view, there are large uncertain range for derived clocks. I remember it may be 300 ps at least for a range of temperature. Thank you. Weng
On Wed, 8 Oct 2008 18:38:04 -0700 (PDT), Weng Tianxiang
<wtxwtx@gmail.com> wrote:

>On Oct 8, 10:04&#4294967295;am, Andy <jonesa...@comcast.net> wrote:
>Peter misses the point and your response hits the point: >"I'd bet large processors use derived clocks, >with known phase relationships, to avoid the typical asynchronous >clock boundary crossing logic wherever possible. " > >I disagree with you. How can they manage the huge range of >temperatures that causes clock circiut shifting.
There is probably a lot less logic involved in continuously auto-calibrating a clock DLL to eliminate temperature/voltage drift, than there is in boundary crossing logic for a 256-bit or wider internal bus. - Brian