comp.arch.fpga | ddr clock issues

Open Cores DDR controller uses 2 DCM's to generate the clocks.

clk -> dcm0 -> clock used for fddr to produce true + negative ddr clocks
                         feedback comes from true ddr clock
                         fddr has hard wired 01 inputs for true clock,
                        10 inputs for negative clock

clk -> dcm1 -> (0 clock) bufg1 -> clock used for all ddr related
internal logic
                     -> (270 clock) bufg2 -> clock used for fddr's for
DDR's data in lines
                      feedback comes from the output of bufg1


dcm0 has a tunable parameter, phase shift of 30 ps. I've moved this all
the way
to -530ps with no failure. It seems irrelevant.

I want to get rid of one of the DCM's, 2 seems excessive. Is it common
to use
an fddr to get a clock to the outside this way? That is, an fddr has
fixed inputs
(input0 <= '0', input1 <= '1') and so the fddr output is really just a
data selector,
when the input clock is low you get input0, when high you get output1. Why
not route the clock through to the outside directly?

I've tried hanging the DDR's clock off of bufg1 (still going through fddr)
but it doesn't work reliably, I get flaky data.

Where can I find info about clock generation issues, specifically
related to ddr.
I never would have come up with the scheme that seems to actually work in
this case. Is it possible to do with just one DCM?

Thanks--
Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Reply by David Ashley ●September 18, 20062006-09-18

David Ashley wrote:
> Open Cores DDR controller uses 2 DCM's to generate the clocks.
> 
> clk -> dcm0 -> clock used for fddr to produce true + negative ddr clocks
>                          feedback comes from true ddr clock
>                          fddr has hard wired 01 inputs for true clock,
>                         10 inputs for negative clock
> 
> clk -> dcm1 -> (0 clock) bufg1 -> clock used for all ddr related
> internal logic
>                      -> (270 clock) bufg2 -> clock used for fddr's for
> DDR's data in lines
>                       feedback comes from the output of bufg1
> 
> 
> dcm0 has a tunable parameter, phase shift of 30 ps. I've moved this all
> the way
> to -530ps with no failure. It seems irrelevant.
> 
> I want to get rid of one of the DCM's, 2 seems excessive. Is it common
> to use
> an fddr to get a clock to the outside this way? That is, an fddr has
> fixed inputs
> (input0 <= '0', input1 <= '1') and so the fddr output is really just a
> data selector,
> when the input clock is low you get input0, when high you get output1. Why
> not route the clock through to the outside directly?
> 
> I've tried hanging the DDR's clock off of bufg1 (still going through fddr)
> but it doesn't work reliably, I get flaky data.
> 
> Where can I find info about clock generation issues, specifically
> related to ddr.
> I never would have come up with the scheme that seems to actually work in
> this case. Is it possible to do with just one DCM?
> 
> Thanks--
> Dave
> 

I found a xilinx app note xapp802.pdf which has a nice block
diagram of an approach with just just one DCM on page 3.
It is related to virtex but I'd hope spartan-3e would be the same...

-Dave


-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Reply by Gabor ●September 19, 20062006-09-19

David Ashley wrote:
[snip]
> I want to get rid of one of the DCM's, 2 seems excessive. Is it common
> to use
> an fddr to get a clock to the outside this way? That is, an fddr has
> fixed inputs
> (input0 <= '0', input1 <= '1') and so the fddr output is really just a
> data selector,
> when the input clock is low you get input0, when high you get output1. Why
> not route the clock through to the outside directly?
>
> I've tried hanging the DDR's clock off of bufg1 (still going through fddr)
> but it doesn't work reliably, I get flaky data.
>
> Where can I find info about clock generation issues, specifically
> related to ddr.

The FDDR is used to generate the external signal with the same
clock to output delay as the associated data lines.  Routing
a clock to an output buffer requires non-clock resources in
the Xilinx parts.  The FDDR takes the global clock (very low
skew) directly from the dedicated routing.  Its delay is matched
to the clock to out delay of the DDR flops on the DQ bus.  So
if you us a DCM and global clock resources to generate the
internal clocks for DQ and clock, you directly set the phase
relationship between the clock output and DQ.  When you
try to route the clock through an output buffer you are at the
mercy of the router, and even if you get the design to work
the timing may change if you re-build due to chenges of
seemingly unrelated sections of the design.

Reply by David Ashley ●September 19, 20062006-09-19

Gabor wrote:
> David Ashley wrote:
> [snip]
> 
>>I want to get rid of one of the DCM's, 2 seems excessive. Is it common
>>to use
>>an fddr to get a clock to the outside this way? That is, an fddr has
>>fixed inputs
>>(input0 <= '0', input1 <= '1') and so the fddr output is really just a
>>data selector,
>>when the input clock is low you get input0, when high you get output1. Why
>>not route the clock through to the outside directly?
>>
>>I've tried hanging the DDR's clock off of bufg1 (still going through fddr)
>>but it doesn't work reliably, I get flaky data.
>>
>>Where can I find info about clock generation issues, specifically
>>related to ddr.
> 
> 
> The FDDR is used to generate the external signal with the same
> clock to output delay as the associated data lines.  Routing
> a clock to an output buffer requires non-clock resources in
> the Xilinx parts.  The FDDR takes the global clock (very low
> skew) directly from the dedicated routing.  Its delay is matched
> to the clock to out delay of the DDR flops on the DQ bus.  So
> if you us a DCM and global clock resources to generate the
> internal clocks for DQ and clock, you directly set the phase
> relationship between the clock output and DQ.  When you
> try to route the clock through an output buffer you are at the
> mercy of the router, and even if you get the design to work
> the timing may change if you re-build due to chenges of
> seemingly unrelated sections of the design.
> 

In experiments I had been able to get rid of the fddr's on the
true + inverted DDR clock outputs, but I just did that to
see if it would work. It's pointless since the FDDR's are part of
the IOB's anyway and conserving them doesn't make them
available for any other function.

However I wasn't able to get rid of the 2nd DCM, and I'm
running out of ideas to try.

One thing of note -- this is on the spartan-3e starter board.
It supplies a 50 mhz clock. I run this through a DCM to produce
100 mhz, and that's use to feed the other 2 DCM's. I kind of
remember this is not a good idea?

Unfortunately (according to my understanding of the DCM's)
you can't both get a multiplied output clock from a DCM and
have the 0, 90, 180 and 270 phases of that clock. So I don't
know how to accomplish this other than stringing DCM's
together. Or get an external 100mhz crystal oscillator and put
it into the socket.

Thanks--
Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Reply by Tommy Thorn ●September 19, 20062006-09-19

David Ashley wrote:
> One thing of note -- this is on the spartan-3e starter board.
> It supplies a 50 mhz clock. I run this through a DCM to produce
> 100 mhz, and that's use to feed the other 2 DCM's. I kind of
> remember this is not a good idea?

The whole issue of DDR clock management and pin constraints is an area 
I'm not too comfortable with. I wish X and A would include with their 
development boards _simple_ example frobbing their SDRAM. Just enough to 
show that it's working, not a complete controller. I can design the 
logic for controller for SDRAM (DDR or SDR) just fine, but it seems 
every FPGA (and board) have different clocking methodology and 
constraints requirements.

David, I hope you find the solution and share it with us :-)

I assume there won't be too much difference between the ML401 (Virtex 4) 
and the Spartan 3E starter kit.

Tommy

Reply by David Ashley ●September 19, 20062006-09-19

Tommy Thorn wrote:
> David Ashley wrote:
> 
>> One thing of note -- this is on the spartan-3e starter board.
>> It supplies a 50 mhz clock. I run this through a DCM to produce
>> 100 mhz, and that's use to feed the other 2 DCM's. I kind of
>> remember this is not a good idea?
> 
> 
> The whole issue of DDR clock management and pin constraints is an area
> I'm not too comfortable with. I wish X and A would include with their
> development boards _simple_ example frobbing their SDRAM. Just enough to
> show that it's working, not a complete controller. I can design the
> logic for controller for SDRAM (DDR or SDR) just fine, but it seems
> every FPGA (and board) have different clocking methodology and
> constraints requirements.
> 
> David, I hope you find the solution and share it with us :-)
> 
> I assume there won't be too much difference between the ML401 (Virtex 4)
> and the Spartan 3E starter kit.
> 
> Tommy

I will certainly share whatever I learn.

One thing just occured to me. BTW I don't have any test equipment,
no 'scope, no logic analyzer, nothing. Just a crappy digital multimeter.
So I can't hook a scope up and look at the signals going into the DDR
itself.

For some reason I think the data going into the DDR is good. The
open cores controller does include logic to generate the DQS strobe
the DDR uses to latch input data. That approach would tend to
balance out timing problems -- the same logic that drives the data
also drives the DQS strobe, so they should sink or swim together
I suppose.

But the open cores DDR doesn't make use of the DQS strobe generated
by the DDR device itself. I'm only trying to run at 100 mhz. In that
case xilinx app notes say the timing is adequate so the DQS strobe isn't
needed to capture data reliably. Maybe the timing would get easier if
the logic made use of the DQS strobe from the DDR.

I have a feeling adding some constraints would make the thing work
with a single DCM. Unfortunately I have no clue what constraints to
add, as I don't know what's going wrong (and don't know much about
constraints writing anyway).

To get around the lack of test equipment, when the thing wasn't
working before I created a module called "vgatext" which outputs
a 96x40 stable text display to the vga outputs. Then I set up so in
the event of a ddr error it will display the desired data vs the actual
data -- and saw it was an off-by-one problem. That was a problem in
my logic but not in the timing.

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Reply by Nico Coesel ●September 19, 20062006-09-19

David Ashley <dash@nowhere.net.dont.email.me> wrote:

>Open Cores DDR controller uses 2 DCM's to generate the clocks.
>
>clk -> dcm0 -> clock used for fddr to produce true + negative ddr clocks
>                         feedback comes from true ddr clock
>                         fddr has hard wired 01 inputs for true clock,
>                        10 inputs for negative clock
>
>clk -> dcm1 -> (0 clock) bufg1 -> clock used for all ddr related
>internal logic
>                     -> (270 clock) bufg2 -> clock used for fddr's for
>DDR's data in lines
>                      feedback comes from the output of bufg1
>
>
>dcm0 has a tunable parameter, phase shift of 30 ps. I've moved this all
>the way
>to -530ps with no failure. It seems irrelevant.
>
>I want to get rid of one of the DCM's, 2 seems excessive. Is it common
>to use
>an fddr to get a clock to the outside this way? That is, an fddr has

All you need is a normal clock and a 90 degrees phase shifted clock.
The whole clocking outside the fpga thing is unnecessary. If you place
the output flipflops inside the IOBs and use an fddr in the IOB to
replicate the internal clock, all signals connected to the DDR memory
will have the same delay.

-- 
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl

Reply by David Ashley ●September 19, 20062006-09-19

Nico Coesel wrote:
> All you need is a normal clock and a 90 degrees phase shifted clock.
> The whole clocking outside the fpga thing is unnecessary. If you place
> the output flipflops inside the IOBs and use an fddr in the IOB to
> replicate the internal clock, all signals connected to the DDR memory
> will have the same delay.

But the DDR spec says the DQS strobe for data written to the
fpga must be center aligned. The DQS is in phase with the
DDR clock. That means the data must be put on the lines
1/2 of 1/2 of a clock cycle early for proper alignment.

This requires a clock that is 270 degrees out of phase from the
DDR's clock. This is the clock used for the data lines going into
the DDR..

I don't understand the "clocking outside the fpga" you mention.
The fpga currently has one 50 mhz external clock source. I
run that through a DCM to make it 100 mhz. Then in order for
the DDR to work I need to use two more DCM's. One is used
to make the DDR clocks (positive and negative). The other is
used for everything else.

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Reply by Gabor ●September 19, 20062006-09-19

David Ashley wrote:
[snip]
> But the open cores DDR doesn't make use of the DQS strobe generated
> by the DDR device itself. I'm only trying to run at 100 mhz. In that
> case xilinx app notes say the timing is adequate so the DQS strobe isn't
> needed to capture data reliably. Maybe the timing would get easier if
> the logic made use of the DQS strobe from the DDR.
>

I'm doing pretty much the same thing with Virtex 2 (similar
architecture
to Spartan 3) on a proprietary board.  This board has a 66.66 MHz
clock that is doubled to run the DDR at 133 MHz (266 DDR).  I
do not use the DQS inputs for sampling data.  I did need to tweak
the delay in my DCM's to get reliable sampling.  I did not use any
expensive test equipment for this, I just used the variable delay
mode of the DCM to run tests at various phases and centered
the final fixed value within the area that seemed to work.

At 100 MHz I would expect the timing margins to be quite good
even in the slowest speed grade parts.  I'm using Virtex 2 -5
speed grade in my 133 MHz design.

> I have a feeling adding some constraints would make the thing work
> with a single DCM. Unfortunately I have no clue what constraints to
> add, as I don't know what's going wrong (and don't know much about
> constraints writing anyway).
>

The problem with  a single DCM is that you need to make up
for phase differences in the board routing.  Signals to the DDR
memory arrive there some prop. delay after they leave the FPGA.
At the memory end they need to meet setup and hold time to
the clock as it arrives at the memory, usually at the same
board routing delay as the clock.  So if your clock and data/
address/control outputs use the same internal clock, you
would need to use board routing or some other delay element
external to the FPGA to ensure hold time is met at the memory.

Then the data returning from the memory shows up 2 board
prop. delays from the driven clock, plus the clock to output
timing specified in the memory datasheet.  So the sampling
point isn't exactly centered within the outgoing clock half-
period.  So your sampling clock may need to be off by some
phase other than 90 degrees from the clock driving your
outputs.  All of this is pretty hard to accomplish with one
DCM, IMHO.  And just adding timing constraints without the
mechanism to meet them makes life miserable on the tools,
which usually fail miserably in response (they have only
internal routing delays to make up your requested timing).

Reply by David Ashley ●September 19, 20062006-09-19

David Ashley wrote:
> I will certainly share whatever I learn.

I got my simple write/ read-verify system to work. I was able
to get rid of one of the DCM's, so I only need 2.

DCM #1 takes 50 mhz input and I use the 2X output to
drive a clock buffer. This is the tclock signal. Feedback
comes from the clock buffer.

DCM #2 takes tclock and produces 4 phase output.
The 0 and 270 signals drive 2 clock buffers. The 0
clock buffered version goes back into the feedback input
on the DCM. These signals are sys_clk and sys_clk270.

FDDR's are used to produce the DDR's clock. Their inputs
are hardwired for "01" for the true clock, and "10" for
the negative clock.  Both FDDR's take clock from
sys_clk and inverted sys_clk. The inverter is implicit
in the FDDR configuration, no delay penalty exists.

Here's the trick: The original open cores DDR controller
source sampled the data from the DDR on sys_clk rising
and falling edge. I instead push out the sampling by
1/4 of a cycle:
rising_edge(sys_clk)  replaced by falling_edge(sys_clk270)
falling_edge(sys_clk) replaced by rising_edge(sys_clk270)

Then I made a slight tweak to get the sampled data back
into the sys_clk domain as required elsewhere. It works
fine. I had a feeling the problem was in the sampling side
since no special machinery existed to sample in the middle
of when it was valid. The setup time was not being met.

Here's a sample of the before code:
-- **** CODE BEFORE FIX
      process (sys_clk)
      begin
         if rising_edge(sys_clk) then

            -- sample HI-data word with rising edge
            data_hi_q <= data;

            -- store HI- und LO- data word  in 32bit output register
            data_out_q <= data_hi_q & data_lo2_q;

         end if;
      end process;
-- ...
      process (sys_clk)
      begin
         if falling_edge(sys_clk) then

            -- sample LO- word with falling edge
            data_lo1_q <= data;

            -- 1 clock additional delay to store HI- and LO-word
            -- with the next rising edge as 32bit word
            data_lo2_q <= data_lo1_q;
         end if;
      end process;

-- ***** CODE AFTER FIX

      process (sys_clk270)
      begin
         if falling_edge(sys_clk270) then

            -- sample HI-data word with rising edge
            data_hi_q <= data;

          end if;
      end process;

	process (sys_clk) -- (DA) fix to get back into sys_clk domain
	begin
		if rising_edge(sys_clk) then
            -- store HI- und LO- data word  in 32bit output register
			data_out_q <= data_hi_q & data_lo2_q;
		end if;
	end process;

-- ...
      process (sys_clk270)
      begin
         if rising_edge(sys_clk270) then

            -- sample LO- word with falling edge
            data_lo1_q <= data;

            -- 1 clock additional delay to store HI- and LO-word
            -- with the next rising edge as 32bit word
            data_lo2_q <= data_lo1_q;
         end if;
      end process;


Hope this is of use to other people.
-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Previous12 Next

ddr clock issues

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group