David Ashley <dash@nowhere.net.dont.email.me> wrote:

>Nico Coesel wrote:
>> As AFAIK the opencores ddr controller uses some sort of scheme which
>> routes the clock to the outside and pulls it back in again. This is
>> totally unnecessary IMHO.
>
>Yep you're right, that was the feedback line for one of the DCM's.
>Current design works but has no feedback from the outside as
>you suggest. Original design was right on the edge as regards
>sampling the DDR's output data.
>
>> For the DDR you'll need 2 DCMs: 1 to turn 50MHz into 100MHz and 1 to
>> get a clock which is 90 degrees out of phase. fddrs have an internal
>> inverter in their clock inputs so 1 clock to drive these is
>> sufficient.
>
>You're exactly right. Also DCM -> DCM seems to work ok, however
>I'm ignoring the "locked" bit on the 50->100 DCM and the system
>only pays attention to the locked bit on the 2nd DCM. This is
>probably bad.

By the way, there is a Spartan3 issue with daisy chaining DCMs. See
the other thread about 'product lifetime'. ISE 7.1 (dunno about the
other ISE versions) will warn you about this when routing the design.

-- 
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl

Gabor,

In the DCM status, there is the "clock lost" bit.  For the CLKFX, there
is also the "clock stopped" bit.  The DCM is a digital synchronous state
machine, so loss of input clock means that the lock bit, which is a
state, will never change.  These other two status bits are there to tell
you what happened (provide more information).

Good post,

Austin

David Ashley wrote:

>
> You're exactly right. Also DCM -> DCM seems to work ok, however
> I'm ignoring the "locked" bit on the 50->100 DCM and the system
> only pays attention to the locked bit on the 2nd DCM. This is
> probably bad.
>
> -Dave
>
> --
> David Ashley                http://www.xdr.com/dash
> Embedded linux, device drivers, system architecture

Beware of "locked" bits on the Xilinx DCM's.  Once locked, they
tend to continue to report locked even if the input clock goes
away.  You need to look at the "status" outputs to get the
whole picture, and note that you must reset the DCM if
you want it to attempt re-lock afer lock is lost.

In the older parts (Spartan 2) with DLL's, the 2x clock output
drives a 1x clock when the DLL is not locked.  On those parts
I actually use this "feature" to detect lock rather than using
the "locked" output of the DLL (they have no status bus).

Regards,
Gabor

Nico Coesel wrote:
> As AFAIK the opencores ddr controller uses some sort of scheme which
> routes the clock to the outside and pulls it back in again. This is
> totally unnecessary IMHO.

Yep you're right, that was the feedback line for one of the DCM's.
Current design works but has no feedback from the outside as
you suggest. Original design was right on the edge as regards
sampling the DDR's output data.

> For the DDR you'll need 2 DCMs: 1 to turn 50MHz into 100MHz and 1 to
> get a clock which is 90 degrees out of phase. fddrs have an internal
> inverter in their clock inputs so 1 clock to drive these is
> sufficient.

You're exactly right. Also DCM -> DCM seems to work ok, however
I'm ignoring the "locked" bit on the 50->100 DCM and the system
only pays attention to the locked bit on the 2nd DCM. This is
probably bad.

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

"Gabor" <gabor@alacron.com> wrote:

>
>David Ashley wrote:
>[snip]
>> But the open cores DDR doesn't make use of the DQS strobe generated
>> by the DDR device itself. I'm only trying to run at 100 mhz. In that
>> case xilinx app notes say the timing is adequate so the DQS strobe isn't
>> needed to capture data reliably. Maybe the timing would get easier if
>> the logic made use of the DQS strobe from the DDR.
>>
>
>I'm doing pretty much the same thing with Virtex 2 (similar
>architecture
>to Spartan 3) on a proprietary board.  This board has a 66.66 MHz
>clock that is doubled to run the DDR at 133 MHz (266 DDR).  I
>do not use the DQS inputs for sampling data.  I did need to tweak
>the delay in my DCM's to get reliable sampling.  I did not use any
>expensive test equipment for this, I just used the variable delay
>mode of the DCM to run tests at various phases and centered
>the final fixed value within the area that seemed to work.
>
>At 100 MHz I would expect the timing margins to be quite good
>even in the slowest speed grade parts.  I'm using Virtex 2 -5
>speed grade in my 133 MHz design.
>
>> I have a feeling adding some constraints would make the thing work
>> with a single DCM. Unfortunately I have no clue what constraints to
>> add, as I don't know what's going wrong (and don't know much about
>> constraints writing anyway).
>>
>
>The problem with  a single DCM is that you need to make up
>for phase differences in the board routing.  Signals to the DDR
>memory arrive there some prop. delay after they leave the FPGA.
>At the memory end they need to meet setup and hold time to
>the clock as it arrives at the memory, usually at the same
>board routing delay as the clock.  So if your clock and data/
>address/control outputs use the same internal clock, you
>would need to use board routing or some other delay element
>external to the FPGA to ensure hold time is met at the memory.

All these problem go away if you drive the control signals at half the
DDR clock frequency. This is not going to cost performance since all
DDR commands need 2 clock cycles to execute anyway. The only signal
that needs to be fast is CS (which also happens to be the least loaded
line in a larger memory system).

Clocking data into the memory uses DQS which has the same delay as the
DQ lines (if your PCB layout is routed as it is supposed to be).

>Then the data returning from the memory shows up 2 board
>prop. delays from the driven clock, plus the clock to output
>timing specified in the memory datasheet.  So the sampling
>point isn't exactly centered within the outgoing clock half-
>period.  So your sampling clock may need to be off by some
>phase other than 90 degrees from the clock driving your
>outputs.  All of this is pretty hard to accomplish with one
>DCM, IMHO.  And just adding timing constraints without the
>mechanism to meet them makes life miserable on the tools,
>which usually fail miserably in response (they have only
>internal routing delays to make up your requested timing).

If you delay DQS by the IOBDELAY and use this signal to clock DQ
(without IOBDELAY) into the IOB flipflops, then setup and hold timing
should be met (with the proper constraints). But beware, there are
severe limits on how the IOBs must be arranged and you may need to
match the FPGA speed with the memory speed.

-- 
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl

David Ashley <dash@nowhere.net.dont.email.me> wrote:

>Nico Coesel wrote:
>> All you need is a normal clock and a 90 degrees phase shifted clock.
>> The whole clocking outside the fpga thing is unnecessary. If you place
>> the output flipflops inside the IOBs and use an fddr in the IOB to
>> replicate the internal clock, all signals connected to the DDR memory
>> will have the same delay.
>
>But the DDR spec says the DQS strobe for data written to the
>fpga must be center aligned. The DQS is in phase with the
>DDR clock. That means the data must be put on the lines
>1/2 of 1/2 of a clock cycle early for proper alignment.
>
>This requires a clock that is 270 degrees out of phase from the
>DDR's clock. This is the clock used for the data lines going into
>the DDR..

Yes.

>I don't understand the "clocking outside the fpga" you mention.

As AFAIK the opencores ddr controller uses some sort of scheme which
routes the clock to the outside and pulls it back in again. This is
totally unnecessary IMHO.

>The fpga currently has one 50 mhz external clock source. I
>run that through a DCM to make it 100 mhz. Then in order for
>the DDR to work I need to use two more DCM's. One is used
>to make the DDR clocks (positive and negative). The other is
>used for everything else.

For the DDR you'll need 2 DCMs: 1 to turn 50MHz into 100MHz and 1 to
get a clock which is 90 degrees out of phase. fddrs have an internal
inverter in their clock inputs so 1 clock to drive these is
sufficient.

Both DCMs can also be used to create a divided clock from each and
200MHz.

-- 
Reply to nico@nctdevpuntnl (punt=.)
Bedrijven en winkels vindt U op www.adresboekje.nl

David Ashley wrote:
> Hope this is of use to other people.
> -Dave
> 

I've gotten email asking for the source, so I put it up, it can
be found here:

http://www.xdr.com/dash/fpga/

It's targeted to a linux build environment. It needs unisim
to be in the right place in order to build as is...or tweak the
Makefile.

It's a pretty much identical copy of the open cores ddr
controller, except I removed one DCM, and I wrapped
it all in a synthesizable tester targeted to the
spartan-3e starter board. The test just fills up memory
with a non-repeating pattern, then reads it back out.
If the pattern matches an LED stays lit. It keeps doing
this forever.

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

Gabor wrote:
> David Ashley wrote:
> [snip]
> I'm doing pretty much the same thing with Virtex 2 (similar
> architecture
> to Spartan 3) on a proprietary board.  This board has a 66.66 MHz
> clock that is doubled to run the DDR at 133 MHz (266 DDR).  I
> do not use the DQS inputs for sampling data.  I did need to tweak
> the delay in my DCM's to get reliable sampling.  I did not use any
> expensive test equipment for this, I just used the variable delay
> mode of the DCM to run tests at various phases and centered
> the final fixed value within the area that seemed to work.

See other email in this thread for details. I got it working
by sampling data from the DDR on the 90 degree phase
clock, now it works fine. No tweaking of the DCM necessary.
And I'm only using one DCM.

The DDR's DQS output transitions right when the data
becomes valid out of the DDR. But the DDR controller
has to transition the DQS right in the middle of the data
going to the DDR being valid. This is hardly fair. I wish
there wasn't even the DQS signal, it's just a PITA.

-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

David Ashley wrote:
> I will certainly share whatever I learn.

I got my simple write/ read-verify system to work. I was able
to get rid of one of the DCM's, so I only need 2.

DCM #1 takes 50 mhz input and I use the 2X output to
drive a clock buffer. This is the tclock signal. Feedback
comes from the clock buffer.

DCM #2 takes tclock and produces 4 phase output.
The 0 and 270 signals drive 2 clock buffers. The 0
clock buffered version goes back into the feedback input
on the DCM. These signals are sys_clk and sys_clk270.

FDDR's are used to produce the DDR's clock. Their inputs
are hardwired for "01" for the true clock, and "10" for
the negative clock.  Both FDDR's take clock from
sys_clk and inverted sys_clk. The inverter is implicit
in the FDDR configuration, no delay penalty exists.

Here's the trick: The original open cores DDR controller
source sampled the data from the DDR on sys_clk rising
and falling edge. I instead push out the sampling by
1/4 of a cycle:
rising_edge(sys_clk)  replaced by falling_edge(sys_clk270)
falling_edge(sys_clk) replaced by rising_edge(sys_clk270)

Then I made a slight tweak to get the sampled data back
into the sys_clk domain as required elsewhere. It works
fine. I had a feeling the problem was in the sampling side
since no special machinery existed to sample in the middle
of when it was valid. The setup time was not being met.

Here's a sample of the before code:
-- **** CODE BEFORE FIX
      process (sys_clk)
      begin
         if rising_edge(sys_clk) then

            -- sample HI-data word with rising edge
            data_hi_q <= data;

            -- store HI- und LO- data word  in 32bit output register
            data_out_q <= data_hi_q & data_lo2_q;

         end if;
      end process;
-- ...
      process (sys_clk)
      begin
         if falling_edge(sys_clk) then

            -- sample LO- word with falling edge
            data_lo1_q <= data;

            -- 1 clock additional delay to store HI- and LO-word
            -- with the next rising edge as 32bit word
            data_lo2_q <= data_lo1_q;
         end if;
      end process;

-- ***** CODE AFTER FIX

      process (sys_clk270)
      begin
         if falling_edge(sys_clk270) then

            -- sample HI-data word with rising edge
            data_hi_q <= data;

          end if;
      end process;

	process (sys_clk) -- (DA) fix to get back into sys_clk domain
	begin
		if rising_edge(sys_clk) then
            -- store HI- und LO- data word  in 32bit output register
			data_out_q <= data_hi_q & data_lo2_q;
		end if;
	end process;

-- ...
      process (sys_clk270)
      begin
         if rising_edge(sys_clk270) then

            -- sample LO- word with falling edge
            data_lo1_q <= data;

            -- 1 clock additional delay to store HI- and LO-word
            -- with the next rising edge as 32bit word
            data_lo2_q <= data_lo1_q;
         end if;
      end process;


Hope this is of use to other people.
-Dave

-- 
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

David Ashley wrote:
[snip]
> But the open cores DDR doesn't make use of the DQS strobe generated
> by the DDR device itself. I'm only trying to run at 100 mhz. In that
> case xilinx app notes say the timing is adequate so the DQS strobe isn't
> needed to capture data reliably. Maybe the timing would get easier if
> the logic made use of the DQS strobe from the DDR.
>

I'm doing pretty much the same thing with Virtex 2 (similar
architecture
to Spartan 3) on a proprietary board.  This board has a 66.66 MHz
clock that is doubled to run the DDR at 133 MHz (266 DDR).  I
do not use the DQS inputs for sampling data.  I did need to tweak
the delay in my DCM's to get reliable sampling.  I did not use any
expensive test equipment for this, I just used the variable delay
mode of the DCM to run tests at various phases and centered
the final fixed value within the area that seemed to work.

At 100 MHz I would expect the timing margins to be quite good
even in the slowest speed grade parts.  I'm using Virtex 2 -5
speed grade in my 133 MHz design.

> I have a feeling adding some constraints would make the thing work
> with a single DCM. Unfortunately I have no clue what constraints to
> add, as I don't know what's going wrong (and don't know much about
> constraints writing anyway).
>

The problem with  a single DCM is that you need to make up
for phase differences in the board routing.  Signals to the DDR
memory arrive there some prop. delay after they leave the FPGA.
At the memory end they need to meet setup and hold time to
the clock as it arrives at the memory, usually at the same
board routing delay as the clock.  So if your clock and data/
address/control outputs use the same internal clock, you
would need to use board routing or some other delay element
external to the FPGA to ensure hold time is met at the memory.

Then the data returning from the memory shows up 2 board
prop. delays from the driven clock, plus the clock to output
timing specified in the memory datasheet.  So the sampling
point isn't exactly centered within the outgoing clock half-
period.  So your sampling clock may need to be off by some
phase other than 90 degrees from the clock driving your
outputs.  All of this is pretty hard to accomplish with one
DCM, IMHO.  And just adding timing constraints without the
mechanism to meet them makes life miserable on the tools,
which usually fail miserably in response (they have only
internal routing delays to make up your requested timing).