Reply by Philip Pemberton May 24, 20102010-05-24
On Mon, 24 May 2010 01:26:07 +0100, Brian Drummond wrote:

> I don't do Verilog but it makes sense that there's an equivalent to > setting attributes for such things in VHDL. And applying them directly > to the correct signals will save warnings elsewhere...
I had another look at the constraints manual last night and found what I was looking for. Apparently, the syntax I was looking for was: (* IOB="TRUE" *) output sdram_xyz; (replace 'output sdram_xyz' with the I/O spec). This also turned up a nasty INTERNAL_ERROR bug in Xst, see my other thread for more info on this (my second post in that thread contains an explanation of the bug, and a workaround).
> Be aware that XST is finicky though. Your "FORCE" attributes may merely > result in "constraint is being ignored" warnings unless everything else > lines up right (duplicate regs not being optimised away) so if you don't > get what you expect in the .mrp, check the synth report carefully...
Actually, it seems that IOB=FORCE makes Xst bail out if it can't honour the constraint. Some of the SDRAM controller cache logic uses BA internally; setting IOB=FORCE stops the Translate process from completing. Use IOB=TRUE, and ISE appears to add a couple of extra FFs to allow SDRAM_BA to be pushed into the IOB. I've also noticed that it's running the SDRAM pins in slew-rate limited (SLOW) mode, so I'm going to see if adding a "SLEW=FAST" constraint in the UCF helps out any (hopefully it'll get the setup/hold times down from ~6ns to something a little more reasonable). -- Phil. usenet10@philpem.me.uk http://www.philpem.me.uk/ If mail bounces, replace "10" with the last two digits of the current year
Reply by Brian Drummond May 23, 20102010-05-23
On 23 May 2010 18:21:25 GMT, Philip Pemberton <usenet10@philpem.me.uk>
wrote:

>It seems I set "Pack I/O Registers into IOBs" to "Yes" on the working >version (which causes A LOT of warnings), while it's set to "Auto" in the >"broken" version. Can I force FFs in the IOBs in the UCF constraints, or >do I need to do that with a "// synthesis IOB=FORCE" constraint in the >Verilog source?
UCF is a bit too late for synthesis... the only tool that reads it is NGDbuild, aka "Translate", which embeds the UCF information in other files passed downstream. I don't do Verilog but it makes sense that there's an equivalent to setting attributes for such things in VHDL. And applying them directly to the correct signals will save warnings elsewhere... Be aware that XST is finicky though. Your "FORCE" attributes may merely result in "constraint is being ignored" warnings unless everything else lines up right (duplicate regs not being optimised away) so if you don't get what you expect in the .mrp, check the synth report carefully... - Brian
Reply by Philip Pemberton May 23, 20102010-05-23
On Sun, 23 May 2010 11:07:37 +0100, Brian Drummond wrote:

> Given such a slow clock they look OK.
Always good to know :) I'm toying with the idea of running the SDRAM controller faster than the CPU core (the limiter is the CPU -- it manages about 60MHz on a Cyclone2 IIRC; Xst reckons about 47MHz for the entire SoC on a Spartan3A XC3S700A-4C).
> But seeing that has prompted some memories (it's a few years since I set > up constraints for SDR SDRAM).
Yeah, it seems a lot of folk have moved onto DDR or DDR2. SDR-SDRAM seems to have the edge in ease-of-use, but loses out on raw speed. But that said, neither of them can match an SRAM clock-for-clock because of the refresh, precharge and select cycles, and the access latency. Although the caching in the sdram_wb core makes that a bit of a moot point, especially for sequential WISHBONE accesses.
> Look at the I/O report near the end of the Map Report (.mrp) file. For > each I/O pin you will see a lot of information including the I/O > standard, and the registers in the IOB for that pin. For an output pin > (e.g. address) I want to see OFF or OUTFF in that list. For an I/O pin > (data) I want to see IFF/INFF, OFF/OUTFF and ENBFF which tristates the > pin. (Signal names seem to have changed with tool versions)
Oh, that explains a lot! The "broken" version shows blanks under "Reg(s)" for all the SDRAM pins. The "working" version shows a mix of "OFF1", "IFF1" and blank (only SDRAM_CLK and SDRAM_CKE are blank, which is fair enough -- CLK comes from the DCM, CKE is grounded). Thanks, I'd looked at the Map report, but previously didn't really know what I was looking for, which explains why I didn't pick up on the FFs not being pushed into the IOBs... It seems I set "Pack I/O Registers into IOBs" to "Yes" on the working version (which causes A LOT of warnings), while it's set to "Auto" in the "broken" version. Can I force FFs in the IOBs in the UCF constraints, or do I need to do that with a "// synthesis IOB=FORCE" constraint in the Verilog source?
> At 25MHz, feel free to ignore all the above, but it may help to see some > of what's going on beneath the hood.
Well, I'm trying it out at 25MHz because I figure the lower my master clock is, the easier it's going to be to make the thing work. Then once it's working, I can look into making it work on a faster clock. Ideally I'd like to get it going at 50MHz or so -- a lot of processing is going to happen in the FPGA (using hardware implementations of the algorithms I'm using) but the CPU (a hacked up version of the LatticeMico32) will be doing a lot of the integer work, framebuffer updating, and so on. Plan #2 is to rig up an LCD controller that can act as a WISHBONE master, then wire that up to one of the spare master ports on the CONMAX bus arbiter. Then I can use any area of main RAM as the framebuffer, and do away with the messy business of having a separate framebuffer RAM. If any of you guys want to see this code, let me know and I'll stick it online. It's pretty ropey code, but it might do as an example to show how to make the LM32 work on non-Lattice hardware (and how to make the toolchain behave itself). On a final note: the ISSI datasheet for the RAM chip appears to be outright WRONG. It specifies 4096 refresh cycles per 64ms, but if the refresh rate is that low I get data readback errors. If I use the refresh rate for the Industrial-graded chip (4096 per 32ms), or even 4096 cycles per 50us, then it works fine... Yes, I'm using a "Commercial" grade part, not the "Industrial" part. Unless mine has been mismarked.... -- Phil. usenet10@philpem.me.uk http://www.philpem.me.uk/ If mail bounces, replace "10" with the last two digits of the current year
Reply by Brian Drummond May 23, 20102010-05-23
On 23 May 2010 09:14:47 GMT, Philip Pemberton <usenet10@philpem.me.uk>
wrote:

>On Sat, 22 May 2010 20:11:25 -0700, Gabor wrote: > >> As others have mentioned, you probably have some unconstrained paths >> causing timing violations. [...] > >OK, I've just set up these constraints:
>#Created by Constraints Editor (xc3s700a-ft256-4) - 2010/05/23 >TIMEGRP "sdram_outs" OFFSET = OUT 10 ns AFTER "CLOCK"; >TIMEGRP "sdram_outs" OFFSET = IN 10 ns VALID 10 ns BEFORE "CLOCK"; > >Now I can build the core with OPTIMIZE=area or OPTIMIZE=speed, and it >works fine. > >Question: do these timing constraints look sane? I figured since I'm >using a 270-degree shifted version of a DCM'd version of the input clock, >the timing settings should be around a quarter of Tclk_period (Clk period >is 40ns for 25MHz, so that would be 10ns).
Given such a slow clock they look OK. But seeing that has prompted some memories (it's a few years since I set up constraints for SDR SDRAM). The key to getting good I/O timing is to ensure the tools place the I/O registers in the right place - the IOBs rather than the core logic. Then there is no routing involved, and the constraints really only act as a sanity check. (at 200MHz they may alert you to the wrong output standard) If some of your registers were in the IOBs and others weren't, the latter are subject to additional routes of random lengths, and here the constraints WILL help, by forcing PAR to keep these routes down. (and 10ns should be easily achievable). Look at the I/O report near the end of the Map Report (.mrp) file. For each I/O pin you will see a lot of information including the I/O standard, and the registers in the IOB for that pin. For an output pin (e.g. address) I want to see OFF or OUTFF in that list. For an I/O pin (data) I want to see IFF/INFF, OFF/OUTFF and ENBFF which tristates the pin. (Signal names seem to have changed with tool versions) Getting what you want can take some fiddling. For example, you may need to duplicate registers in your code; one to feed the pins and another to use the signal internally. Then you need to convince the synthesis tool to leave them alone; apply the "equivalent-register-removal = no" attribute to the appropriate regs. And check the .MRP file. Loop until done. A few tool versions ago, you also needed to replicate the tristate signal for each ENBFF, and ensure it was the right polarity (active low) but this may have been improved. Downside to all this is that while you have REALLY GOOD external timings, you have lengthened the internal routes by a few ns. So I keep heavy processing hidden behind a second register where that is likely to be a problem. At 25MHz, feel free to ignore all the above, but it may help to see some of what's going on beneath the hood. - Brian
Reply by Philip Pemberton May 23, 20102010-05-23
On Sat, 22 May 2010 20:11:25 -0700, Gabor wrote:

> As others have mentioned, you probably have some unconstrained paths > causing timing violations. [...]
OK, I've just set up these constraints: #Created by Constraints Editor (xc3s700a-ft256-4) - 2010/05/21 NET "CLOCK" TNM_NET = CLOCK; TIMESPEC TS_CLOCK = PERIOD "CLOCK" 25 MHz HIGH 50%; #Created by Constraints Editor (xc3s700a-ft256-4) - 2010/05/23 INST "SDRAM_A<0>" TNM = sdram_outs; INST "SDRAM_A<1>" TNM = sdram_outs; INST "SDRAM_A<2>" TNM = sdram_outs; INST "SDRAM_A<3>" TNM = sdram_outs; INST "SDRAM_A<4>" TNM = sdram_outs; INST "SDRAM_A<5>" TNM = sdram_outs; INST "SDRAM_A<6>" TNM = sdram_outs; INST "SDRAM_A<7>" TNM = sdram_outs; INST "SDRAM_A<8>" TNM = sdram_outs; INST "SDRAM_A<9>" TNM = sdram_outs; INST "SDRAM_A<10>" TNM = sdram_outs; INST "SDRAM_A<11>" TNM = sdram_outs; INST "SDRAM_BA<0>" TNM = sdram_outs; INST "SDRAM_BA<1>" TNM = sdram_outs; INST "SDRAM_CAS_N" TNM = sdram_outs; INST "SDRAM_CKE" TNM = sdram_outs; INST "SDRAM_CLK" TNM = sdram_outs; INST "SDRAM_CS_N" TNM = sdram_outs; INST "SDRAM_DQ<0>" TNM = sdram_outs; INST "SDRAM_DQ<1>" TNM = sdram_outs; INST "SDRAM_DQ<2>" TNM = sdram_outs; INST "SDRAM_DQ<3>" TNM = sdram_outs; INST "SDRAM_DQ<4>" TNM = sdram_outs; INST "SDRAM_DQ<5>" TNM = sdram_outs; INST "SDRAM_DQ<6>" TNM = sdram_outs; INST "SDRAM_DQ<7>" TNM = sdram_outs; INST "SDRAM_DQ<8>" TNM = sdram_outs; INST "SDRAM_DQ<9>" TNM = sdram_outs; INST "SDRAM_DQ<10>" TNM = sdram_outs; INST "SDRAM_DQ<11>" TNM = sdram_outs; INST "SDRAM_DQ<12>" TNM = sdram_outs; INST "SDRAM_DQ<13>" TNM = sdram_outs; INST "SDRAM_DQ<14>" TNM = sdram_outs; INST "SDRAM_DQ<15>" TNM = sdram_outs; INST "SDRAM_DQ<16>" TNM = sdram_outs; INST "SDRAM_DQ<17>" TNM = sdram_outs; INST "SDRAM_DQ<18>" TNM = sdram_outs; INST "SDRAM_DQ<19>" TNM = sdram_outs; INST "SDRAM_DQ<20>" TNM = sdram_outs; INST "SDRAM_DQ<21>" TNM = sdram_outs; INST "SDRAM_DQ<22>" TNM = sdram_outs; INST "SDRAM_DQ<23>" TNM = sdram_outs; INST "SDRAM_DQ<24>" TNM = sdram_outs; INST "SDRAM_DQ<25>" TNM = sdram_outs; INST "SDRAM_DQ<26>" TNM = sdram_outs; INST "SDRAM_DQ<27>" TNM = sdram_outs; INST "SDRAM_DQ<28>" TNM = sdram_outs; INST "SDRAM_DQ<29>" TNM = sdram_outs; INST "SDRAM_DQ<30>" TNM = sdram_outs; INST "SDRAM_DQ<31>" TNM = sdram_outs; INST "SDRAM_DQM<0>" TNM = sdram_outs; INST "SDRAM_DQM<1>" TNM = sdram_outs; INST "SDRAM_DQM<2>" TNM = sdram_outs; INST "SDRAM_DQM<3>" TNM = sdram_outs; INST "SDRAM_RAS_N" TNM = sdram_outs; INST "SDRAM_WE_N" TNM = sdram_outs; #Created by Constraints Editor (xc3s700a-ft256-4) - 2010/05/23 TIMEGRP "sdram_outs" OFFSET = OUT 10 ns AFTER "CLOCK"; TIMEGRP "sdram_outs" OFFSET = IN 10 ns VALID 10 ns BEFORE "CLOCK"; Now I can build the core with OPTIMIZE=area or OPTIMIZE=speed, and it works fine. Question: do these timing constraints look sane? I figured since I'm using a 270-degree shifted version of a DCM'd version of the input clock, the timing settings should be around a quarter of Tclk_period (Clk period is 40ns for 25MHz, so that would be 10ns). CLOCK is the 25MHz crystal input, MCLK is the output from the first DCM (a *25, /25 "multiplier" that effectively acts as a buffer and duty cycle corrector). SDRAM_CLK is an output from the FPGA to the SDRAM, which is sourced from the CLK270 output of the second DCM. Thanks, -- Phil. usenet10@philpem.me.uk http://www.philpem.me.uk/ If mail bounces, replace "10" with the last two digits of the current year
Reply by Nico Coesel May 23, 20102010-05-23
Gabor <gabor@alacron.com> wrote:

>On May 21, 6:19=A0pm, Philip Pemberton <usene...@philpem.me.uk> wrote: >> OK, this is nuts... >> >> With ISE Synthesizer set up like this: >> =A0 Optimisation Goal: =A0 AREA >> =A0 Optimisation Effort: NORMAL >> >> The core works fine (the timing is a little out, but not bad enough to >> pooch the whole thing). If I set it up like this: >> =A0 Optimisation Goal: =A0 SPEED >> =A0 Optimisation Effort: NORMAL >> >> Then the whole thing stops working -- it outright fails to read/write the >> SDRAM. I can access the SDRAM controller's cache (32 bytes of the current >> page), but accessing an out-of-page address returns garbage. >> >> If I do the same thing on Quartus? Well, the timing looks better in SPEED >> mode, but it still works fine on the DE1. >> >> What the *bleep* is going on? >> >> --
>As for SPEED vs. AREA, in Xilinx FPGA's you very often >get the best overall timing results using AREA optimization >rather than speed. This is probably because the route >portion of your total path delay is large. This shows up >in larger designs and larger parts especially since the >worst case routing delays grow with the design size.
Actually this is a bit of black art. I also get good results by adjusting the 'pack factor' (IIRC) which puts related logic closer together. IMHO it takes some trial and error to find the optimum place & route settings for a design which gets close to the limits of the FPGA regarding speed and/or size. -- Failure does not prove something is impossible, failure simply indicates you are not using the right tools... nico@nctdevpuntnl (punt=.) --------------------------------------------------------------
Reply by Gabor May 23, 20102010-05-23
On May 21, 6:19=A0pm, Philip Pemberton <usene...@philpem.me.uk> wrote:
> OK, this is nuts... > > With ISE Synthesizer set up like this: > =A0 Optimisation Goal: =A0 AREA > =A0 Optimisation Effort: NORMAL > > The core works fine (the timing is a little out, but not bad enough to > pooch the whole thing). If I set it up like this: > =A0 Optimisation Goal: =A0 SPEED > =A0 Optimisation Effort: NORMAL > > Then the whole thing stops working -- it outright fails to read/write the > SDRAM. I can access the SDRAM controller's cache (32 bytes of the current > page), but accessing an out-of-page address returns garbage. > > If I do the same thing on Quartus? Well, the timing looks better in SPEED > mode, but it still works fine on the DE1. > > What the *bleep* is going on? > > -- > Phil. > usene...@philpem.me.ukhttp://www.philpem.me.uk/ > If mail bounces, replace "10" with the last two digits of the current yea=
r As others have mentioned, you probably have some unconstrained paths causing timing violations. If you think you have enough constraints, but still have this problem, try setting up the post place&route static timing report for Verbose and enter a good size number like 100 in the option "report unconstrained paths". Then if you find an unconstrained path that probably should be constrained you will know what to add to your constraints. As for SPEED vs. AREA, in Xilinx FPGA's you very often get the best overall timing results using AREA optimization rather than speed. This is probably because the route portion of your total path delay is large. This shows up in larger designs and larger parts especially since the worst case routing delays grow with the design size. Regards, Gabor
Reply by Nico Coesel May 22, 20102010-05-22
Philip Pemberton <usenet10@philpem.me.uk> wrote:

>OK, this is nuts... > >With ISE Synthesizer set up like this: > Optimisation Goal: AREA > Optimisation Effort: NORMAL > >The core works fine (the timing is a little out, but not bad enough to >pooch the whole thing). If I set it up like this: > Optimisation Goal: SPEED > Optimisation Effort: NORMAL > >Then the whole thing stops working -- it outright fails to read/write the >SDRAM. I can access the SDRAM controller's cache (32 bytes of the current >page), but accessing an out-of-page address returns garbage. > >If I do the same thing on Quartus? Well, the timing looks better in SPEED >mode, but it still works fine on the DE1. > >What the *bleep* is going on?
You probably have unconstrained paths which meet timing or not depending on the routing mode. It is crucial that all paths in an FPGA have timing constraints (including paths from the inputs to the flipflops and flipflops to the outputs). -- Failure does not prove something is impossible, failure simply indicates you are not using the right tools... nico@nctdevpuntnl (punt=.) --------------------------------------------------------------
Reply by maxascent May 22, 20102010-05-22
The constraints you should have in the ucf are the input clock fequency,
the pin constraints and the IO types you are using for those pins. If the
design passes simulation and meets timing once you have run P&R then you
should be ok. 

Jon
	   
					
---------------------------------------		
Posted through http://www.FPGARelated.com
Reply by Philip Pemberton May 21, 20102010-05-21
OK, this is nuts...

With ISE Synthesizer set up like this:
  Optimisation Goal:   AREA
  Optimisation Effort: NORMAL

The core works fine (the timing is a little out, but not bad enough to 
pooch the whole thing). If I set it up like this:
  Optimisation Goal:   SPEED
  Optimisation Effort: NORMAL

Then the whole thing stops working -- it outright fails to read/write the 
SDRAM. I can access the SDRAM controller's cache (32 bytes of the current 
page), but accessing an out-of-page address returns garbage.

If I do the same thing on Quartus? Well, the timing looks better in SPEED 
mode, but it still works fine on the DE1.

What the *bleep* is going on?

-- 
Phil.
usenet10@philpem.me.uk
http://www.philpem.me.uk/
If mail bounces, replace "10" with the last two digits of the current year