FPGARelated.com
Forums

ddr clock issues

Started by David Ashley September 18, 2006
Open Cores DDR controller uses 2 DCM's to generate the clocks.

clk -> dcm0 -> clock used for fddr to produce true + negative ddr clocks
                         feedback comes from true ddr clock
                         fddr has hard wired 01 inputs for true clock,
                        10 inputs for negative clock

clk -> dcm1 -> (0 clock) bufg1 -> clock used for all ddr related
internal logic
                     -> (270 clock) bufg2 -> clock used for fddr's for
DDR's data in lines
                      feedback comes from the output of bufg1


dcm0 has a tunable parameter, phase shift of 30 ps. I've moved this all
the way
to -530ps with no failure. It seems irrelevant.

I want to get rid of one of the DCM's, 2 seems excessive. Is it common
to use
an fddr to get a clock to the outside this way? That is, an fddr has
fixed inputs
(input0 <= '0', input1 <= '1') and so the fddr output is really just a
data selector,
when the input clock is low you get input0, when high you get output1. Why
not route the clock through to the outside directly?

I've tried hanging the DDR's clock off of bufg1 (still going through fddr)
but it doesn't work reliably, I get flaky data.

Where can I find info about clock generation issues, specifically
related to ddr.
I never would have come up with the scheme that seems to actually work in
this case. Is it possible to do with just one DCM?

Thanks--
Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture
David Ashley wrote:
> Open Cores DDR controller uses 2 DCM's to generate the clocks. > > clk -> dcm0 -> clock used for fddr to produce true + negative ddr clocks > feedback comes from true ddr clock > fddr has hard wired 01 inputs for true clock, > 10 inputs for negative clock > > clk -> dcm1 -> (0 clock) bufg1 -> clock used for all ddr related > internal logic > -> (270 clock) bufg2 -> clock used for fddr's for > DDR's data in lines > feedback comes from the output of bufg1 > > > dcm0 has a tunable parameter, phase shift of 30 ps. I've moved this all > the way > to -530ps with no failure. It seems irrelevant. > > I want to get rid of one of the DCM's, 2 seems excessive. Is it common > to use > an fddr to get a clock to the outside this way? That is, an fddr has > fixed inputs > (input0 <= '0', input1 <= '1') and so the fddr output is really just a > data selector, > when the input clock is low you get input0, when high you get output1. Why > not route the clock through to the outside directly? > > I've tried hanging the DDR's clock off of bufg1 (still going through fddr) > but it doesn't work reliably, I get flaky data. > > Where can I find info about clock generation issues, specifically > related to ddr. > I never would have come up with the scheme that seems to actually work in > this case. Is it possible to do with just one DCM? > > Thanks-- > Dave >
I found a xilinx app note xapp802.pdf which has a nice block diagram of an approach with just just one DCM on page 3. It is related to virtex but I'd hope spartan-3e would be the same... -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
David Ashley wrote:
[snip]
> I want to get rid of one of the DCM's, 2 seems excessive. Is it common > to use > an fddr to get a clock to the outside this way? That is, an fddr has > fixed inputs > (input0 <= '0', input1 <= '1') and so the fddr output is really just a > data selector, > when the input clock is low you get input0, when high you get output1. Why > not route the clock through to the outside directly? > > I've tried hanging the DDR's clock off of bufg1 (still going through fddr) > but it doesn't work reliably, I get flaky data. > > Where can I find info about clock generation issues, specifically > related to ddr.
The FDDR is used to generate the external signal with the same clock to output delay as the associated data lines. Routing a clock to an output buffer requires non-clock resources in the Xilinx parts. The FDDR takes the global clock (very low skew) directly from the dedicated routing. Its delay is matched to the clock to out delay of the DDR flops on the DQ bus. So if you us a DCM and global clock resources to generate the internal clocks for DQ and clock, you directly set the phase relationship between the clock output and DQ. When you try to route the clock through an output buffer you are at the mercy of the router, and even if you get the design to work the timing may change if you re-build due to chenges of seemingly unrelated sections of the design.
Gabor wrote:
> David Ashley wrote: > [snip] > >>I want to get rid of one of the DCM's, 2 seems excessive. Is it common >>to use >>an fddr to get a clock to the outside this way? That is, an fddr has >>fixed inputs >>(input0 <= '0', input1 <= '1') and so the fddr output is really just a >>data selector, >>when the input clock is low you get input0, when high you get output1. Why >>not route the clock through to the outside directly? >> >>I've tried hanging the DDR's clock off of bufg1 (still going through fddr) >>but it doesn't work reliably, I get flaky data. >> >>Where can I find info about clock generation issues, specifically >>related to ddr. > > > The FDDR is used to generate the external signal with the same > clock to output delay as the associated data lines. Routing > a clock to an output buffer requires non-clock resources in > the Xilinx parts. The FDDR takes the global clock (very low > skew) directly from the dedicated routing. Its delay is matched > to the clock to out delay of the DDR flops on the DQ bus. So > if you us a DCM and global clock resources to generate the > internal clocks for DQ and clock, you directly set the phase > relationship between the clock output and DQ. When you > try to route the clock through an output buffer you are at the > mercy of the router, and even if you get the design to work > the timing may change if you re-build due to chenges of > seemingly unrelated sections of the design. >
In experiments I had been able to get rid of the fddr's on the true + inverted DDR clock outputs, but I just did that to see if it would work. It's pointless since the FDDR's are part of the IOB's anyway and conserving them doesn't make them available for any other function. However I wasn't able to get rid of the 2nd DCM, and I'm running out of ideas to try. One thing of note -- this is on the spartan-3e starter board. It supplies a 50 mhz clock. I run this through a DCM to produce 100 mhz, and that's use to feed the other 2 DCM's. I kind of remember this is not a good idea? Unfortunately (according to my understanding of the DCM's) you can't both get a multiplied output clock from a DCM and have the 0, 90, 180 and 270 phases of that clock. So I don't know how to accomplish this other than stringing DCM's together. Or get an external 100mhz crystal oscillator and put it into the socket. Thanks-- Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
David Ashley wrote:
> One thing of note -- this is on the spartan-3e starter board. > It supplies a 50 mhz clock. I run this through a DCM to produce > 100 mhz, and that's use to feed the other 2 DCM's. I kind of > remember this is not a good idea?
The whole issue of DDR clock management and pin constraints is an area I'm not too comfortable with. I wish X and A would include with their development boards _simple_ example frobbing their SDRAM. Just enough to show that it's working, not a complete controller. I can design the logic for controller for SDRAM (DDR or SDR) just fine, but it seems every FPGA (and board) have different clocking methodology and constraints requirements. David, I hope you find the solution and share it with us :-) I assume there won't be too much difference between the ML401 (Virtex 4) and the Spartan 3E starter kit. Tommy
Tommy Thorn wrote:
> David Ashley wrote: > >> One thing of note -- this is on the spartan-3e starter board. >> It supplies a 50 mhz clock. I run this through a DCM to produce >> 100 mhz, and that's use to feed the other 2 DCM's. I kind of >> remember this is not a good idea? > > > The whole issue of DDR clock management and pin constraints is an area > I'm not too comfortable with. I wish X and A would include with their > development boards _simple_ example frobbing their SDRAM. Just enough to > show that it's working, not a complete controller. I can design the > logic for controller for SDRAM (DDR or SDR) just fine, but it seems > every FPGA (and board) have different clocking methodology and > constraints requirements. > > David, I hope you find the solution and share it with us :-) > > I assume there won't be too much difference between the ML401 (Virtex 4) > and the Spartan 3E starter kit. > > Tommy
I will certainly share whatever I learn. One thing just occured to me. BTW I don't have any test equipment, no 'scope, no logic analyzer, nothing. Just a crappy digital multimeter. So I can't hook a scope up and look at the signals going into the DDR itself. For some reason I think the data going into the DDR is good. The open cores controller does include logic to generate the DQS strobe the DDR uses to latch input data. That approach would tend to balance out timing problems -- the same logic that drives the data also drives the DQS strobe, so they should sink or swim together I suppose. But the open cores DDR doesn't make use of the DQS strobe generated by the DDR device itself. I'm only trying to run at 100 mhz. In that case xilinx app notes say the timing is adequate so the DQS strobe isn't needed to capture data reliably. Maybe the timing would get easier if the logic made use of the DQS strobe from the DDR. I have a feeling adding some constraints would make the thing work with a single DCM. Unfortunately I have no clue what constraints to add, as I don't know what's going wrong (and don't know much about constraints writing anyway). To get around the lack of test equipment, when the thing wasn't working before I created a module called "vgatext" which outputs a 96x40 stable text display to the vga outputs. Then I set up so in the event of a ddr error it will display the desired data vs the actual data -- and saw it was an off-by-one problem. That was a problem in my logic but not in the timing. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
David Ashley <dash@nowhere.net.dont.email.me> wrote:

>Open Cores DDR controller uses 2 DCM's to generate the clocks. > >clk -> dcm0 -> clock used for fddr to produce true + negative ddr clocks > feedback comes from true ddr clock > fddr has hard wired 01 inputs for true clock, > 10 inputs for negative clock > >clk -> dcm1 -> (0 clock) bufg1 -> clock used for all ddr related >internal logic > -> (270 clock) bufg2 -> clock used for fddr's for >DDR's data in lines > feedback comes from the output of bufg1 > > >dcm0 has a tunable parameter, phase shift of 30 ps. I've moved this all >the way >to -530ps with no failure. It seems irrelevant. > >I want to get rid of one of the DCM's, 2 seems excessive. Is it common >to use >an fddr to get a clock to the outside this way? That is, an fddr has
All you need is a normal clock and a 90 degrees phase shifted clock. The whole clocking outside the fpga thing is unnecessary. If you place the output flipflops inside the IOBs and use an fddr in the IOB to replicate the internal clock, all signals connected to the DDR memory will have the same delay. -- Reply to nico@nctdevpuntnl (punt=.) Bedrijven en winkels vindt U op www.adresboekje.nl
Nico Coesel wrote:
> All you need is a normal clock and a 90 degrees phase shifted clock. > The whole clocking outside the fpga thing is unnecessary. If you place > the output flipflops inside the IOBs and use an fddr in the IOB to > replicate the internal clock, all signals connected to the DDR memory > will have the same delay.
But the DDR spec says the DQS strobe for data written to the fpga must be center aligned. The DQS is in phase with the DDR clock. That means the data must be put on the lines 1/2 of 1/2 of a clock cycle early for proper alignment. This requires a clock that is 270 degrees out of phase from the DDR's clock. This is the clock used for the data lines going into the DDR.. I don't understand the "clocking outside the fpga" you mention. The fpga currently has one 50 mhz external clock source. I run that through a DCM to make it 100 mhz. Then in order for the DDR to work I need to use two more DCM's. One is used to make the DDR clocks (positive and negative). The other is used for everything else. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
David Ashley wrote:
[snip]
> But the open cores DDR doesn't make use of the DQS strobe generated > by the DDR device itself. I'm only trying to run at 100 mhz. In that > case xilinx app notes say the timing is adequate so the DQS strobe isn't > needed to capture data reliably. Maybe the timing would get easier if > the logic made use of the DQS strobe from the DDR. >
I'm doing pretty much the same thing with Virtex 2 (similar architecture to Spartan 3) on a proprietary board. This board has a 66.66 MHz clock that is doubled to run the DDR at 133 MHz (266 DDR). I do not use the DQS inputs for sampling data. I did need to tweak the delay in my DCM's to get reliable sampling. I did not use any expensive test equipment for this, I just used the variable delay mode of the DCM to run tests at various phases and centered the final fixed value within the area that seemed to work. At 100 MHz I would expect the timing margins to be quite good even in the slowest speed grade parts. I'm using Virtex 2 -5 speed grade in my 133 MHz design.
> I have a feeling adding some constraints would make the thing work > with a single DCM. Unfortunately I have no clue what constraints to > add, as I don't know what's going wrong (and don't know much about > constraints writing anyway). >
The problem with a single DCM is that you need to make up for phase differences in the board routing. Signals to the DDR memory arrive there some prop. delay after they leave the FPGA. At the memory end they need to meet setup and hold time to the clock as it arrives at the memory, usually at the same board routing delay as the clock. So if your clock and data/ address/control outputs use the same internal clock, you would need to use board routing or some other delay element external to the FPGA to ensure hold time is met at the memory. Then the data returning from the memory shows up 2 board prop. delays from the driven clock, plus the clock to output timing specified in the memory datasheet. So the sampling point isn't exactly centered within the outgoing clock half- period. So your sampling clock may need to be off by some phase other than 90 degrees from the clock driving your outputs. All of this is pretty hard to accomplish with one DCM, IMHO. And just adding timing constraints without the mechanism to meet them makes life miserable on the tools, which usually fail miserably in response (they have only internal routing delays to make up your requested timing).
David Ashley wrote:
> I will certainly share whatever I learn.
I got my simple write/ read-verify system to work. I was able to get rid of one of the DCM's, so I only need 2. DCM #1 takes 50 mhz input and I use the 2X output to drive a clock buffer. This is the tclock signal. Feedback comes from the clock buffer. DCM #2 takes tclock and produces 4 phase output. The 0 and 270 signals drive 2 clock buffers. The 0 clock buffered version goes back into the feedback input on the DCM. These signals are sys_clk and sys_clk270. FDDR's are used to produce the DDR's clock. Their inputs are hardwired for "01" for the true clock, and "10" for the negative clock. Both FDDR's take clock from sys_clk and inverted sys_clk. The inverter is implicit in the FDDR configuration, no delay penalty exists. Here's the trick: The original open cores DDR controller source sampled the data from the DDR on sys_clk rising and falling edge. I instead push out the sampling by 1/4 of a cycle: rising_edge(sys_clk) replaced by falling_edge(sys_clk270) falling_edge(sys_clk) replaced by rising_edge(sys_clk270) Then I made a slight tweak to get the sampled data back into the sys_clk domain as required elsewhere. It works fine. I had a feeling the problem was in the sampling side since no special machinery existed to sample in the middle of when it was valid. The setup time was not being met. Here's a sample of the before code: -- **** CODE BEFORE FIX process (sys_clk) begin if rising_edge(sys_clk) then -- sample HI-data word with rising edge data_hi_q <= data; -- store HI- und LO- data word in 32bit output register data_out_q <= data_hi_q & data_lo2_q; end if; end process; -- ... process (sys_clk) begin if falling_edge(sys_clk) then -- sample LO- word with falling edge data_lo1_q <= data; -- 1 clock additional delay to store HI- and LO-word -- with the next rising edge as 32bit word data_lo2_q <= data_lo1_q; end if; end process; -- ***** CODE AFTER FIX process (sys_clk270) begin if falling_edge(sys_clk270) then -- sample HI-data word with rising edge data_hi_q <= data; end if; end process; process (sys_clk) -- (DA) fix to get back into sys_clk domain begin if rising_edge(sys_clk) then -- store HI- und LO- data word in 32bit output register data_out_q <= data_hi_q & data_lo2_q; end if; end process; -- ... process (sys_clk270) begin if rising_edge(sys_clk270) then -- sample LO- word with falling edge data_lo1_q <= data; -- 1 clock additional delay to store HI- and LO-word -- with the next rising edge as 32bit word data_lo2_q <= data_lo1_q; end if; end process; Hope this is of use to other people. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture