Reply by Nico Coesel September 21, 20062006-09-21
David Ashley <dash@nowhere.net.dont.email.me> wrote:

>Nico Coesel wrote: >> As AFAIK the opencores ddr controller uses some sort of scheme which >> routes the clock to the outside and pulls it back in again. This is >> totally unnecessary IMHO. > >Yep you're right, that was the feedback line for one of the DCM's. >Current design works but has no feedback from the outside as >you suggest. Original design was right on the edge as regards >sampling the DDR's output data. > >> For the DDR you'll need 2 DCMs: 1 to turn 50MHz into 100MHz and 1 to >> get a clock which is 90 degrees out of phase. fddrs have an internal >> inverter in their clock inputs so 1 clock to drive these is >> sufficient. > >You're exactly right. Also DCM -> DCM seems to work ok, however >I'm ignoring the "locked" bit on the 50->100 DCM and the system >only pays attention to the locked bit on the 2nd DCM. This is >probably bad.
By the way, there is a Spartan3 issue with daisy chaining DCMs. See the other thread about 'product lifetime'. ISE 7.1 (dunno about the other ISE versions) will warn you about this when routing the design. -- Reply to nico@nctdevpuntnl (punt=.) Bedrijven en winkels vindt U op www.adresboekje.nl
Reply by Austin Lesea September 21, 20062006-09-21
Gabor,

In the DCM status, there is the "clock lost" bit.  For the CLKFX, there
is also the "clock stopped" bit.  The DCM is a digital synchronous state
machine, so loss of input clock means that the lock bit, which is a
state, will never change.  These other two status bits are there to tell
you what happened (provide more information).

Good post,

Austin
Reply by Gabor September 21, 20062006-09-21
David Ashley wrote:

> > You're exactly right. Also DCM -> DCM seems to work ok, however > I'm ignoring the "locked" bit on the 50->100 DCM and the system > only pays attention to the locked bit on the 2nd DCM. This is > probably bad. > > -Dave > > -- > David Ashley http://www.xdr.com/dash > Embedded linux, device drivers, system architecture
Beware of "locked" bits on the Xilinx DCM's. Once locked, they tend to continue to report locked even if the input clock goes away. You need to look at the "status" outputs to get the whole picture, and note that you must reset the DCM if you want it to attempt re-lock afer lock is lost. In the older parts (Spartan 2) with DLL's, the 2x clock output drives a 1x clock when the DLL is not locked. On those parts I actually use this "feature" to detect lock rather than using the "locked" output of the DLL (they have no status bus). Regards, Gabor
Reply by David Ashley September 20, 20062006-09-20
Nico Coesel wrote:
> As AFAIK the opencores ddr controller uses some sort of scheme which > routes the clock to the outside and pulls it back in again. This is > totally unnecessary IMHO.
Yep you're right, that was the feedback line for one of the DCM's. Current design works but has no feedback from the outside as you suggest. Original design was right on the edge as regards sampling the DDR's output data.
> For the DDR you'll need 2 DCMs: 1 to turn 50MHz into 100MHz and 1 to > get a clock which is 90 degrees out of phase. fddrs have an internal > inverter in their clock inputs so 1 clock to drive these is > sufficient.
You're exactly right. Also DCM -> DCM seems to work ok, however I'm ignoring the "locked" bit on the 50->100 DCM and the system only pays attention to the locked bit on the 2nd DCM. This is probably bad. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
Reply by Nico Coesel September 20, 20062006-09-20
"Gabor" <gabor@alacron.com> wrote:

> >David Ashley wrote: >[snip] >> But the open cores DDR doesn't make use of the DQS strobe generated >> by the DDR device itself. I'm only trying to run at 100 mhz. In that >> case xilinx app notes say the timing is adequate so the DQS strobe isn't >> needed to capture data reliably. Maybe the timing would get easier if >> the logic made use of the DQS strobe from the DDR. >> > >I'm doing pretty much the same thing with Virtex 2 (similar >architecture >to Spartan 3) on a proprietary board. This board has a 66.66 MHz >clock that is doubled to run the DDR at 133 MHz (266 DDR). I >do not use the DQS inputs for sampling data. I did need to tweak >the delay in my DCM's to get reliable sampling. I did not use any >expensive test equipment for this, I just used the variable delay >mode of the DCM to run tests at various phases and centered >the final fixed value within the area that seemed to work. > >At 100 MHz I would expect the timing margins to be quite good >even in the slowest speed grade parts. I'm using Virtex 2 -5 >speed grade in my 133 MHz design. > >> I have a feeling adding some constraints would make the thing work >> with a single DCM. Unfortunately I have no clue what constraints to >> add, as I don't know what's going wrong (and don't know much about >> constraints writing anyway). >> > >The problem with a single DCM is that you need to make up >for phase differences in the board routing. Signals to the DDR >memory arrive there some prop. delay after they leave the FPGA. >At the memory end they need to meet setup and hold time to >the clock as it arrives at the memory, usually at the same >board routing delay as the clock. So if your clock and data/ >address/control outputs use the same internal clock, you >would need to use board routing or some other delay element >external to the FPGA to ensure hold time is met at the memory.
All these problem go away if you drive the control signals at half the DDR clock frequency. This is not going to cost performance since all DDR commands need 2 clock cycles to execute anyway. The only signal that needs to be fast is CS (which also happens to be the least loaded line in a larger memory system). Clocking data into the memory uses DQS which has the same delay as the DQ lines (if your PCB layout is routed as it is supposed to be).
>Then the data returning from the memory shows up 2 board >prop. delays from the driven clock, plus the clock to output >timing specified in the memory datasheet. So the sampling >point isn't exactly centered within the outgoing clock half- >period. So your sampling clock may need to be off by some >phase other than 90 degrees from the clock driving your >outputs. All of this is pretty hard to accomplish with one >DCM, IMHO. And just adding timing constraints without the >mechanism to meet them makes life miserable on the tools, >which usually fail miserably in response (they have only >internal routing delays to make up your requested timing).
If you delay DQS by the IOBDELAY and use this signal to clock DQ (without IOBDELAY) into the IOB flipflops, then setup and hold timing should be met (with the proper constraints). But beware, there are severe limits on how the IOBs must be arranged and you may need to match the FPGA speed with the memory speed. -- Reply to nico@nctdevpuntnl (punt=.) Bedrijven en winkels vindt U op www.adresboekje.nl
Reply by Nico Coesel September 20, 20062006-09-20
David Ashley <dash@nowhere.net.dont.email.me> wrote:

>Nico Coesel wrote: >> All you need is a normal clock and a 90 degrees phase shifted clock. >> The whole clocking outside the fpga thing is unnecessary. If you place >> the output flipflops inside the IOBs and use an fddr in the IOB to >> replicate the internal clock, all signals connected to the DDR memory >> will have the same delay. > >But the DDR spec says the DQS strobe for data written to the >fpga must be center aligned. The DQS is in phase with the >DDR clock. That means the data must be put on the lines >1/2 of 1/2 of a clock cycle early for proper alignment. > >This requires a clock that is 270 degrees out of phase from the >DDR's clock. This is the clock used for the data lines going into >the DDR..
Yes.
>I don't understand the "clocking outside the fpga" you mention.
As AFAIK the opencores ddr controller uses some sort of scheme which routes the clock to the outside and pulls it back in again. This is totally unnecessary IMHO.
>The fpga currently has one 50 mhz external clock source. I >run that through a DCM to make it 100 mhz. Then in order for >the DDR to work I need to use two more DCM's. One is used >to make the DDR clocks (positive and negative). The other is >used for everything else.
For the DDR you'll need 2 DCMs: 1 to turn 50MHz into 100MHz and 1 to get a clock which is 90 degrees out of phase. fddrs have an internal inverter in their clock inputs so 1 clock to drive these is sufficient. Both DCMs can also be used to create a divided clock from each and 200MHz. -- Reply to nico@nctdevpuntnl (punt=.) Bedrijven en winkels vindt U op www.adresboekje.nl
Reply by David Ashley September 19, 20062006-09-19
David Ashley wrote:
> Hope this is of use to other people. > -Dave >
I've gotten email asking for the source, so I put it up, it can be found here: http://www.xdr.com/dash/fpga/ It's targeted to a linux build environment. It needs unisim to be in the right place in order to build as is...or tweak the Makefile. It's a pretty much identical copy of the open cores ddr controller, except I removed one DCM, and I wrapped it all in a synthesizable tester targeted to the spartan-3e starter board. The test just fills up memory with a non-repeating pattern, then reads it back out. If the pattern matches an LED stays lit. It keeps doing this forever. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
Reply by David Ashley September 19, 20062006-09-19
Gabor wrote:
> David Ashley wrote: > [snip] > I'm doing pretty much the same thing with Virtex 2 (similar > architecture > to Spartan 3) on a proprietary board. This board has a 66.66 MHz > clock that is doubled to run the DDR at 133 MHz (266 DDR). I > do not use the DQS inputs for sampling data. I did need to tweak > the delay in my DCM's to get reliable sampling. I did not use any > expensive test equipment for this, I just used the variable delay > mode of the DCM to run tests at various phases and centered > the final fixed value within the area that seemed to work.
See other email in this thread for details. I got it working by sampling data from the DDR on the 90 degree phase clock, now it works fine. No tweaking of the DCM necessary. And I'm only using one DCM. The DDR's DQS output transitions right when the data becomes valid out of the DDR. But the DDR controller has to transition the DQS right in the middle of the data going to the DDR being valid. This is hardly fair. I wish there wasn't even the DQS signal, it's just a PITA. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
Reply by David Ashley September 19, 20062006-09-19
David Ashley wrote:
> I will certainly share whatever I learn.
I got my simple write/ read-verify system to work. I was able to get rid of one of the DCM's, so I only need 2. DCM #1 takes 50 mhz input and I use the 2X output to drive a clock buffer. This is the tclock signal. Feedback comes from the clock buffer. DCM #2 takes tclock and produces 4 phase output. The 0 and 270 signals drive 2 clock buffers. The 0 clock buffered version goes back into the feedback input on the DCM. These signals are sys_clk and sys_clk270. FDDR's are used to produce the DDR's clock. Their inputs are hardwired for "01" for the true clock, and "10" for the negative clock. Both FDDR's take clock from sys_clk and inverted sys_clk. The inverter is implicit in the FDDR configuration, no delay penalty exists. Here's the trick: The original open cores DDR controller source sampled the data from the DDR on sys_clk rising and falling edge. I instead push out the sampling by 1/4 of a cycle: rising_edge(sys_clk) replaced by falling_edge(sys_clk270) falling_edge(sys_clk) replaced by rising_edge(sys_clk270) Then I made a slight tweak to get the sampled data back into the sys_clk domain as required elsewhere. It works fine. I had a feeling the problem was in the sampling side since no special machinery existed to sample in the middle of when it was valid. The setup time was not being met. Here's a sample of the before code: -- **** CODE BEFORE FIX process (sys_clk) begin if rising_edge(sys_clk) then -- sample HI-data word with rising edge data_hi_q <= data; -- store HI- und LO- data word in 32bit output register data_out_q <= data_hi_q & data_lo2_q; end if; end process; -- ... process (sys_clk) begin if falling_edge(sys_clk) then -- sample LO- word with falling edge data_lo1_q <= data; -- 1 clock additional delay to store HI- and LO-word -- with the next rising edge as 32bit word data_lo2_q <= data_lo1_q; end if; end process; -- ***** CODE AFTER FIX process (sys_clk270) begin if falling_edge(sys_clk270) then -- sample HI-data word with rising edge data_hi_q <= data; end if; end process; process (sys_clk) -- (DA) fix to get back into sys_clk domain begin if rising_edge(sys_clk) then -- store HI- und LO- data word in 32bit output register data_out_q <= data_hi_q & data_lo2_q; end if; end process; -- ... process (sys_clk270) begin if rising_edge(sys_clk270) then -- sample LO- word with falling edge data_lo1_q <= data; -- 1 clock additional delay to store HI- and LO-word -- with the next rising edge as 32bit word data_lo2_q <= data_lo1_q; end if; end process; Hope this is of use to other people. -Dave -- David Ashley http://www.xdr.com/dash Embedded linux, device drivers, system architecture
Reply by Gabor September 19, 20062006-09-19
David Ashley wrote:
[snip]
> But the open cores DDR doesn't make use of the DQS strobe generated > by the DDR device itself. I'm only trying to run at 100 mhz. In that > case xilinx app notes say the timing is adequate so the DQS strobe isn't > needed to capture data reliably. Maybe the timing would get easier if > the logic made use of the DQS strobe from the DDR. >
I'm doing pretty much the same thing with Virtex 2 (similar architecture to Spartan 3) on a proprietary board. This board has a 66.66 MHz clock that is doubled to run the DDR at 133 MHz (266 DDR). I do not use the DQS inputs for sampling data. I did need to tweak the delay in my DCM's to get reliable sampling. I did not use any expensive test equipment for this, I just used the variable delay mode of the DCM to run tests at various phases and centered the final fixed value within the area that seemed to work. At 100 MHz I would expect the timing margins to be quite good even in the slowest speed grade parts. I'm using Virtex 2 -5 speed grade in my 133 MHz design.
> I have a feeling adding some constraints would make the thing work > with a single DCM. Unfortunately I have no clue what constraints to > add, as I don't know what's going wrong (and don't know much about > constraints writing anyway). >
The problem with a single DCM is that you need to make up for phase differences in the board routing. Signals to the DDR memory arrive there some prop. delay after they leave the FPGA. At the memory end they need to meet setup and hold time to the clock as it arrives at the memory, usually at the same board routing delay as the clock. So if your clock and data/ address/control outputs use the same internal clock, you would need to use board routing or some other delay element external to the FPGA to ensure hold time is met at the memory. Then the data returning from the memory shows up 2 board prop. delays from the driven clock, plus the clock to output timing specified in the memory datasheet. So the sampling point isn't exactly centered within the outgoing clock half- period. So your sampling clock may need to be off by some phase other than 90 degrees from the clock driving your outputs. All of this is pretty hard to accomplish with one DCM, IMHO. And just adding timing constraints without the mechanism to meet them makes life miserable on the tools, which usually fail miserably in response (they have only internal routing delays to make up your requested timing).