FPGARelated.com
Forums

sram

Started by kristoff July 22, 2017
brimdavis@gmail.com wrote on 8/8/2017 8:37 PM:
> KJ wrote: >> >> It's even easier than that to synchronously control a standard async SRAM. >> Simply connect WE to the clock and hold OE active all the time except >> for cycles where you want to write something new into the SRAM. >> > As has been explained to you in detail by several other posters, your method is not 'easier' with modern FPGA's and SRAMs. > > The simplest way to get a high speed clock {gated or not} off the chip, coincident with other registered I/O signals, is to use the dual-edge IOB flip-flops as I suggested. > > The DDR technique I mentioned would run synchronous single-cycle read or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC) 10 ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE pulse width requirements. > > Another advantage of the 'forwarding' method is that one can use the internal FPGA clock resources for clock multiply/divides etc. without needing to also manage the board-level low-skew clock distribution needed by your method.
I can't say I follow what you are proposing. How do you get the clock out of the FPGA with a defined time relationship to the signals clocked through the IOB? Is this done with feedback from the output clock using the internal clocking circuits? -- Rick C
On Wed, 09 Aug 2017 22:33:40 -0400, rickman wrote:

> brimdavis@gmail.com wrote on 8/8/2017 8:37 PM: >> KJ wrote: >>> >>> It's even easier than that to synchronously control a standard async >>> SRAM. >>> Simply connect WE to the clock and hold OE active all the time except >>> for cycles where you want to write something new into the SRAM. >>> >> As has been explained to you in detail by several other posters, your >> method is not 'easier' with modern FPGA's and SRAMs. >> >> The simplest way to get a high speed clock {gated or not} off the chip, >> coincident with other registered I/O signals, is to use the dual-edge >> IOB flip-flops as I suggested. >> >> The DDR technique I mentioned would run synchronous single-cycle read >> or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC) 10 >> ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE pulse >> width requirements. >> >> Another advantage of the 'forwarding' method is that one can use the >> internal FPGA clock resources for clock multiply/divides etc. without >> needing to also manage the board-level low-skew clock distribution >> needed by your method. > > I can't say I follow what you are proposing. How do you get the clock > out of the FPGA with a defined time relationship to the signals clocked > through the IOB? Is this done with feedback from the output clock using > the internal clocking circuits?
About a decade back, mainstream FPGAs gained greatly expanded IOB clocking abilities to support DDR RAM (and other interfaces such as RGMII). In particular, one can forward a clock out of an FPGA pin phase aligned with data on other pins. You can also use one of the internal PLLs to generate phase shifted clocks, and thus have a phase shift on the pins between two data signals or between the clock and the data signals. This can be done without needing feedback from the pins. You should try reading a datasheet occasionally - they can be very informative. Just in case someone has blocked Google where you are: here's an example: https://www.xilinx.com/support/documentation/user_guides/ug571-ultrascale- selectio.pdf Allan
rickman wrote:
> > I can't say I follow what you are proposing. How do you get > the clock out of the FPGA with a defined time relationship > to the signals clocked through the IOB? >
The links I gave in my original post explain the technique:
>> >> I posted some notes on this technique (for a Spartan-3) to the fpga-cpu group >> many years ago: >> https://groups.yahoo.com/neo/groups/fpga-cpu/conversations/messages/2076 >> https://groups.yahoo.com/neo/groups/fpga-cpu/conversations/messages/2177 >>
Allan Herriman wrote:
> > About a decade back, mainstream FPGAs gained greatly expanded IOB > clocking abilities to support DDR RAM (and other interfaces such as > RGMII). >
Nearly twenty years now! Xilinx parts had ODDR equivalents in Virtex-E using hard macros; then the actual ODDR primitive stuff appeared in Virtex-2. -Brian
Allan Herriman wrote on 8/10/2017 2:02 AM:
> On Wed, 09 Aug 2017 22:33:40 -0400, rickman wrote: > >> brimdavis@gmail.com wrote on 8/8/2017 8:37 PM: >>> KJ wrote: >>>> >>>> It's even easier than that to synchronously control a standard async >>>> SRAM. >>>> Simply connect WE to the clock and hold OE active all the time except >>>> for cycles where you want to write something new into the SRAM. >>>> >>> As has been explained to you in detail by several other posters, your >>> method is not 'easier' with modern FPGA's and SRAMs. >>> >>> The simplest way to get a high speed clock {gated or not} off the chip, >>> coincident with other registered I/O signals, is to use the dual-edge >>> IOB flip-flops as I suggested. >>> >>> The DDR technique I mentioned would run synchronous single-cycle read >>> or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC) 10 >>> ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE pulse >>> width requirements. >>> >>> Another advantage of the 'forwarding' method is that one can use the >>> internal FPGA clock resources for clock multiply/divides etc. without >>> needing to also manage the board-level low-skew clock distribution >>> needed by your method. >> >> I can't say I follow what you are proposing. How do you get the clock >> out of the FPGA with a defined time relationship to the signals clocked >> through the IOB? Is this done with feedback from the output clock using >> the internal clocking circuits? > > > About a decade back, mainstream FPGAs gained greatly expanded IOB > clocking abilities to support DDR RAM (and other interfaces such as > RGMII). > In particular, one can forward a clock out of an FPGA pin phase aligned > with data on other pins. You can also use one of the internal PLLs to > generate phase shifted clocks, and thus have a phase shift on the pins > between two data signals or between the clock and the data signals. > > This can be done without needing feedback from the pins. > > > You should try reading a datasheet occasionally - they can be very > informative. > Just in case someone has blocked Google where you are: here's an example: > https://www.xilinx.com/support/documentation/user_guides/ug571-ultrascale- > selectio.pdf
Thank you for the link to the 356 page document. No, I have not researched how every brand of FPGA implements DDR interfaces mostly because I have not designed a DDR memory interface in an FPGA. I did look at the document and didn't find info on how the timing delays through the IOB might be synchronized with the output clock. So how exactly does the tight alignment of a clock exiting a Xilinx FPGA maintain alignment with data exiting the FPGA over time and differential temperature? What will the timing relationship be and how tightly can it be maintained? Just waving your hands and saying things can be aligned doesn't explain how it works. This is a discussion. If you aren't interested in discussing, then please don't bother to reply. -- Rick C
brimdavis@gmail.com wrote on 8/10/2017 7:46 PM:
> rickman wrote: >> >> I can't say I follow what you are proposing. How do you get >> the clock out of the FPGA with a defined time relationship >> to the signals clocked through the IOB? >> > > The links I gave in my original post explain the technique: >>> >>> I posted some notes on this technique (for a Spartan-3) to the fpga-cpu group >> many years ago: >>> https://groups.yahoo.com/neo/groups/fpga-cpu/conversations/messages/2076 >>> https://groups.yahoo.com/neo/groups/fpga-cpu/conversations/messages/2177 >>>
I haven't used a Xilinx part in at something like 15 years. So I don't recall all the details. I don't follow how you achieve the timing margin needed between the address, control and data signals which are passing through the IOB and the WE signal pulse is being generated in the IOB DDR. Even with a hold time requirement of 0 ns something has to be done to prevent a race condition. Your posts seem to say you used different drive strengths to use the trace capacitance to create different delays in signal timing. If you can't use a data sheet to produce a timing analysis, it would seem to be a fairly sketchy method that you can't count on to work under all conditions. I suppose you could qualify the circuit over temperature and voltage and then make some assumptions about process variability, but as I say, sketchy. -- Rick C
On 8/10/17 10:39 PM, rickman wrote:
> Allan Herriman wrote on 8/10/2017 2:02 AM: >> On Wed, 09 Aug 2017 22:33:40 -0400, rickman wrote: >> >>> brimdavis@gmail.com wrote on 8/8/2017 8:37 PM: >>>> KJ wrote: >>>>> >>>>> It's even easier than that to synchronously control a standard async >>>>> SRAM. >>>>> Simply connect WE to the clock and hold OE active all the time except >>>>> for cycles where you want to write something new into the SRAM. >>>>> >>>> As has been explained to you in detail by several other posters, your >>>> method is not 'easier' with modern FPGA's and SRAMs. >>>> >>>> The simplest way to get a high speed clock {gated or not} off the chip, >>>> coincident with other registered I/O signals, is to use the dual-edge >>>> IOB flip-flops as I suggested. >>>> >>>> The DDR technique I mentioned would run synchronous single-cycle read >>>> or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC) 10 >>>> ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE pulse >>>> width requirements. >>>> >>>> Another advantage of the 'forwarding' method is that one can use the >>>> internal FPGA clock resources for clock multiply/divides etc. without >>>> needing to also manage the board-level low-skew clock distribution >>>> needed by your method. >>> >>> I can't say I follow what you are proposing. How do you get the clock >>> out of the FPGA with a defined time relationship to the signals clocked >>> through the IOB? Is this done with feedback from the output clock using >>> the internal clocking circuits? >> >> >> About a decade back, mainstream FPGAs gained greatly expanded IOB >> clocking abilities to support DDR RAM (and other interfaces such as >> RGMII). >> In particular, one can forward a clock out of an FPGA pin phase aligned >> with data on other pins. You can also use one of the internal PLLs to >> generate phase shifted clocks, and thus have a phase shift on the pins >> between two data signals or between the clock and the data signals. >> >> This can be done without needing feedback from the pins. >> >> >> You should try reading a datasheet occasionally - they can be very >> informative. >> Just in case someone has blocked Google where you are: here's an example: >> https://www.xilinx.com/support/documentation/user_guides/ug571-ultrascale- >> >> selectio.pdf > > Thank you for the link to the 356 page document. No, I have not > researched how every brand of FPGA implements DDR interfaces mostly > because I have not designed a DDR memory interface in an FPGA. I did > look at the document and didn't find info on how the timing delays > through the IOB might be synchronized with the output clock. > > So how exactly does the tight alignment of a clock exiting a Xilinx FPGA > maintain alignment with data exiting the FPGA over time and differential > temperature? What will the timing relationship be and how tightly can > it be maintained? > > Just waving your hands and saying things can be aligned doesn't explain > how it works. This is a discussion. If you aren't interested in > discussing, then please don't bother to reply. >
Thinking about it, YES, FPGAs normally have a few pins that can be configured as dedicated clock drivers, and it will generally be guaranteed that if those pins are driving out a global clock, then any other pin with output clocked by that clock will change so as to have a known hold time (over specified operating conditions). This being the way to run a typical synchronous interface. Since this method requires the WE signal to be the clock, you need to find a part that has either a write mask signal, or perhaps is multi-ported so this port could be dedicated to writes and another port could be used to read what is needed (the original part for this thread wouldn't be usable with this method).
Richard Damon wrote on 8/11/2017 12:09 AM:
> On 8/10/17 10:39 PM, rickman wrote: >> Allan Herriman wrote on 8/10/2017 2:02 AM: >>> On Wed, 09 Aug 2017 22:33:40 -0400, rickman wrote: >>> >>>> brimdavis@gmail.com wrote on 8/8/2017 8:37 PM: >>>>> KJ wrote: >>>>>> >>>>>> It's even easier than that to synchronously control a standard async >>>>>> SRAM. >>>>>> Simply connect WE to the clock and hold OE active all the time except >>>>>> for cycles where you want to write something new into the SRAM. >>>>>> >>>>> As has been explained to you in detail by several other posters, your >>>>> method is not 'easier' with modern FPGA's and SRAMs. >>>>> >>>>> The simplest way to get a high speed clock {gated or not} off the chip, >>>>> coincident with other registered I/O signals, is to use the dual-edge >>>>> IOB flip-flops as I suggested. >>>>> >>>>> The DDR technique I mentioned would run synchronous single-cycle read >>>>> or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC) 10 >>>>> ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE pulse >>>>> width requirements. >>>>> >>>>> Another advantage of the 'forwarding' method is that one can use the >>>>> internal FPGA clock resources for clock multiply/divides etc. without >>>>> needing to also manage the board-level low-skew clock distribution >>>>> needed by your method. >>>> >>>> I can't say I follow what you are proposing. How do you get the clock >>>> out of the FPGA with a defined time relationship to the signals clocked >>>> through the IOB? Is this done with feedback from the output clock using >>>> the internal clocking circuits? >>> >>> >>> About a decade back, mainstream FPGAs gained greatly expanded IOB >>> clocking abilities to support DDR RAM (and other interfaces such as >>> RGMII). >>> In particular, one can forward a clock out of an FPGA pin phase aligned >>> with data on other pins. You can also use one of the internal PLLs to >>> generate phase shifted clocks, and thus have a phase shift on the pins >>> between two data signals or between the clock and the data signals. >>> >>> This can be done without needing feedback from the pins. >>> >>> >>> You should try reading a datasheet occasionally - they can be very >>> informative. >>> Just in case someone has blocked Google where you are: here's an example: >>> https://www.xilinx.com/support/documentation/user_guides/ug571-ultrascale- >>> selectio.pdf >> >> Thank you for the link to the 356 page document. No, I have not >> researched how every brand of FPGA implements DDR interfaces mostly >> because I have not designed a DDR memory interface in an FPGA. I did look >> at the document and didn't find info on how the timing delays through the >> IOB might be synchronized with the output clock. >> >> So how exactly does the tight alignment of a clock exiting a Xilinx FPGA >> maintain alignment with data exiting the FPGA over time and differential >> temperature? What will the timing relationship be and how tightly can it >> be maintained? >> >> Just waving your hands and saying things can be aligned doesn't explain >> how it works. This is a discussion. If you aren't interested in >> discussing, then please don't bother to reply. >> > > Thinking about it, YES, FPGAs normally have a few pins that can be > configured as dedicated clock drivers, and it will generally be guaranteed > that if those pins are driving out a global clock, then any other pin with > output clocked by that clock will change so as to have a known hold time > (over specified operating conditions). This being the way to run a typical > synchronous interface. > > Since this method requires the WE signal to be the clock, you need to find a > part that has either a write mask signal, or perhaps is multi-ported so this > port could be dedicated to writes and another port could be used to read > what is needed (the original part for this thread wouldn't be usable with > this method).
I'm not sure you read the full thread. The method for generating the WE signal is to use the two DDR FFs to drive a one level during one half of the clock and to drive the write signal during the other half of the clock. I misspoke above when I called it a "clock". The *other* method involved using the actual clock as WE and gating it with the OE signal which won't work on all async RAMs. So with the DDR method *all* of the signals will exit the chip with a nominal zero timing delay relative to each other. This is literally the edge of the async RAM spec. So you need to have some delays on the other signals relative to the WE to allow for variation in timing of individual outputs. It seems the method suggested is to drive the CS and WE signals hard and lighten the drive on the other outputs. This is a method that is not relying on any guaranteed spec from the FPGA maker. This method uses trace capacitance to create delta t = delta v * c / i to speed or slow the rising edge of the various outputs. This relies on over compensating the FPGA spec by means that depend on details of the board layout. It reminds me of the early days of generating timing signals for DRAM with logic delays. Yeah, you might get it to work, but the layout will need to be treated with care and respect even more so than an impedance controlled trace. It will need to be characterized over temperature and voltage and you will have to design in enough margin to allow for process variations. -- Rick C
On Thu, 10 Aug 2017 16:46:13 -0700, brimdavis wrote:

> rickman wrote: >> >> I can't say I follow what you are proposing. How do you get the clock >> out of the FPGA with a defined time relationship to the signals clocked >> through the IOB? >> >> > The links I gave in my original post explain the technique: >>> >>> I posted some notes on this technique (for a Spartan-3) to the >>> fpga-cpu group >> many years ago: >>> https://groups.yahoo.com/neo/groups/fpga-cpu/conversations/
messages/2076
>>> https://groups.yahoo.com/neo/groups/fpga-cpu/conversations/
messages/2177
>>> >>> > Allan Herriman wrote: >> >> About a decade back, mainstream FPGAs gained greatly expanded IOB >> clocking abilities to support DDR RAM (and other interfaces such as >> RGMII). >> >> > Nearly twenty years now! > > Xilinx parts had ODDR equivalents in Virtex-E using hard macros; then > the actual ODDR primitive stuff appeared in Virtex-2.
Nearly twenty years! Doesn't time fly when you're having fun. Thinking back, the last time I connected an async SRAM to an FPGA was in 1997, using a Xilinx 5200 series device. The 5200 was a low cost family, a bit like the XC4000 series, but with even worse routing resources, and (keeping it on-topic for this thread) NO IOB FF. Yes, that's right, to get repeatable IO timing, one had to LOC a fabric FF near the pin and do manual routing from that FF to the pin. (The manual routing could be saved as a string in a constraints file, IIRC). Still, I managed to meet all the SRAM timing requirements, but only by using two clocks for each RAM read or write. The write strobe used a negative edge triggered FF. "And if you tell that to the young people today, they won't believe you" Regards, Allan
On Thu, 10 Aug 2017 22:39:39 -0400, rickman wrote:

> Allan Herriman wrote on 8/10/2017 2:02 AM: >> On Wed, 09 Aug 2017 22:33:40 -0400, rickman wrote: >> >>> brimdavis@gmail.com wrote on 8/8/2017 8:37 PM: >>>> KJ wrote: >>>>> >>>>> It's even easier than that to synchronously control a standard async >>>>> SRAM. >>>>> Simply connect WE to the clock and hold OE active all the time >>>>> except for cycles where you want to write something new into the >>>>> SRAM. >>>>> >>>> As has been explained to you in detail by several other posters, your >>>> method is not 'easier' with modern FPGA's and SRAMs. >>>> >>>> The simplest way to get a high speed clock {gated or not} off the >>>> chip, >>>> coincident with other registered I/O signals, is to use the dual-edge >>>> IOB flip-flops as I suggested. >>>> >>>> The DDR technique I mentioned would run synchronous single-cycle read >>>> or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC) >>>> 10 ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE >>>> pulse width requirements. >>>> >>>> Another advantage of the 'forwarding' method is that one can use the >>>> internal FPGA clock resources for clock multiply/divides etc. >>>> without needing to also manage the board-level low-skew clock >>>> distribution needed by your method. >>> >>> I can't say I follow what you are proposing. How do you get the clock >>> out of the FPGA with a defined time relationship to the signals >>> clocked through the IOB? Is this done with feedback from the output >>> clock using the internal clocking circuits? >> >> >> About a decade back, mainstream FPGAs gained greatly expanded IOB >> clocking abilities to support DDR RAM (and other interfaces such as >> RGMII). >> In particular, one can forward a clock out of an FPGA pin phase aligned >> with data on other pins. You can also use one of the internal PLLs to >> generate phase shifted clocks, and thus have a phase shift on the pins >> between two data signals or between the clock and the data signals. >> >> This can be done without needing feedback from the pins. >> >> >> You should try reading a datasheet occasionally - they can be very >> informative. >> Just in case someone has blocked Google where you are: here's an >> example: >> https://www.xilinx.com/support/documentation/user_guides/ug571-
ultrascale-
>> selectio.pdf > > Thank you for the link to the 356 page document. No, I have not > researched how every brand of FPGA implements DDR interfaces mostly > because I have not designed a DDR memory interface in an FPGA. I did > look at the document and didn't find info on how the timing delays > through the IOB might be synchronized with the output clock. > > So how exactly does the tight alignment of a clock exiting a Xilinx FPGA > maintain alignment with data exiting the FPGA over time and differential > temperature? What will the timing relationship be and how tightly can > it be maintained? > > Just waving your hands and saying things can be aligned doesn't explain > how it works. This is a discussion. If you aren't interested in > discussing, then please don't bother to reply.
As you say you've never done DDR I'll give a simple explanation here, using Xilinx primitives as an example. The clock forwarding is not the same as connecting an internal clock net to an output pin. Instead, it is output through an ODDR, in exactly the same way that the DDR output data is produced. (Except in this case, instead of outputting two data phases, D1 and D2, it just outputs two constants, '1' and '0' (or '0' and '1' if you want the opposite phase) to produce a square wave. The clock-forwarding output and the data output ODDR blocks are all clocked from the same clock on a low skew internal clock net. This will typically have some tens of ps (to hundreds of ps, depending on the particular clocking resource) skew. There will also be skew due to the different trace lengths for each signal in the BGA interposer, but these are known and can be compensated for in the PCB design. Perhaps you want deliberate skew between the clock and data (e.g. for RGMII) - there are two ways of doing that: 1. Use an ODELAY block on (a subset of) the outputs, ODELAY sits between the ODDR output and the input of the OBUF pin driver. The ODELAY is calibrated by a reference clock, and thus is stable against PVT. It has a delay programmable between ~0 and a few ns. It has an accuracy of some tens of ps, and produces some tens of ps jitter on the signal passing through it. 2. Use a PLL (or MMCM) to produce deliberately skewed system clocks inside the FPGA. These will need separate clocking resources to get to the IO blocks (leading to some hundreds of ps of additional, unknown skew). More details can be found in the user guide that I linked earlier. Allan
rickman wrote:

> I haven't used a Xilinx part in at something like 15 years.
Then maybe you shouldn't post comments like this:
> This is a method that is not relying on any guaranteed spec > from the FPGA maker. This method uses trace capacitance to > create delta t = delta v * c /i to speed or slow the rising > edge of the various outputs.
Xilinx characterizes and publishes I/O buffer switching parameters vs. IOSTANDARD/SLEW/DRIVE settings; this information is both summarized in the datasheet and used in generating the timing reports, providing the base delay of the I/O buffer independent of any external capacitive loading [1]. The I/O drive values I used in my S3 testing provided an I/O buffer delay difference of about 1 ns (at the fast device corner) between WE and the address/data lines. While these I/O pins will be slowed further by any board level loading, for any reasonable board layout it is improbable that this loading will somehow reverse the WE timing relationship and violate the zero-ns hold requirement. My original 2004 posts clearly specified what was (timing at FPGA pins) and wasn't (board level signal integrity issues) covered in my example:
>> >> - board level timing hasn't been looked at ( note that S3 >> timing reports don't include output buffer loading ) >>
For purposes of a demo example design, I'm perfectly happy with an address/data hold of 10% of the SRAM minimum cycle time, given that the SRAM hold specification is zero ns. If a design needs more precise control, many of the newer parts have calibrated I/O delays (already mentioned by Allan) that can be used to produce known time delays; in the older S3 family, the easiest way to provide an adjustable time delay would be to use a DCM to phase shift the clock to the OFDDRRSE flip-flop primitive driving WE. -Brian [1] UG199 S3 data sheet v3.1 https://www.xilinx.com/support/documentation/data_sheets/ds099.pdf page 83: " " The Output timing for all standards, as published in the speed files " and the data sheet, is always based on a CL value of zero. "