FPGARelated.com
Forums

PCB routing issues for sync SRAM

Started by radarman March 26, 2010
This isn't strictly a FPGA question, but I figured someone here might
be able to point me in the right direction.

I am designing a board with an Altera EP3C40 in the 240-pin QFP and a
Cypress CY7C1792 static SRAM in the 100 pin QFP. I would like to
operate the SRAM at 200MHz, so I know the routing needs to be somewhat
careful. (I'm internally "dual-porting" the SRAM, and each port needs
to run at 100MHz)

Right now, I have the SRAM on the flip-side of the board from the
FPGA. The RAM has a rectangular footprint, which means that some of
the traces are proportionally longer than others, but the routing is
fairly tight, with traces between 250 and 750mils. Naturally, every
signal is going through a via in this design, but the vias are
literally right next to the pad, so the top-level trace is practically
non-existent for most signals.

The questions are,
1) Do I need to further tighten these up? I have some room left under
the SRAM to lengthen traces (not much, but I might could improve the
delta by 10-20%),
2) Should I try to make the clock line equal the longest non-clock
signal, or leave it at its natural length, which is about midway
(400mil point-to-point)?
3) With the traces this short, does it still make sense to source
terminate the clock? I'm guessing yes, but the density is getting
pretty high around this thing.

I've only done one other "high speed" design, with a Gig-E PHY, but I
was able to get all of the signals to within +/- 5 mils on that board.
It's also not entirely tested yet, so before I spin another board
running even faster, I'd like to get it right.

Note, this is a personal project, so I'm trying to avoid BGA's.

Thanks!
On Mar 27, 9:13=A0am, radarman <jsham...@gmail.com> wrote:
> This isn't strictly a FPGA question, but I figured someone here might > be able to point me in the right direction. > > I am designing a board with an Altera EP3C40 in the 240-pin QFP and a > Cypress CY7C1792 static SRAM in the 100 pin QFP. I would like to > operate the SRAM at 200MHz, so I know the routing needs to be somewhat > careful. (I'm internally "dual-porting" the SRAM, and each port needs > to run at 100MHz) > > Right now, I have the SRAM on the flip-side of the board from the > FPGA. The RAM has a rectangular footprint, which means that some of > the traces are proportionally longer than others, but the routing is > fairly tight, with traces between 250 and 750mils. Naturally, every > signal is going through a via in this design, but the vias are > literally right next to the pad, so the top-level trace is practically > non-existent for most signals. > > The questions are, > 1) Do I need to further tighten these up? I have some room left under > the SRAM to lengthen traces (not much, but I might could improve the > delta by 10-20%), > 2) Should I try to make the clock line equal the longest non-clock > signal, or leave it at its natural length, which is about midway > (400mil point-to-point)? > 3) With the traces this short, does it still make sense to source > terminate the clock? I'm guessing yes, but the density is getting > pretty high around this thing. > > I've only done one other "high speed" design, with a Gig-E PHY, but I > was able to get all of the signals to within +/- 5 mils on that board. > It's also not entirely tested yet, so before I spin another board > running even faster, I'd like to get it right. > > Note, this is a personal project, so I'm trying to avoid BGA's. > > Thanks!
You can reality check this with a Trace-delay ballpark of "150 ps/ inch and 190 ps/inch". The clock signal is always the most important, and I have seen designs generate a CLK, & !CLK to lower EMC. Everything else should change of the other edge, so balancing only gets critical on very tight time budgets.
On Mar 26, 5:13=A0pm, radarman <jsham...@gmail.com> wrote:
> > The questions are, > 1) Do I need to further tighten these up? I have some room left under > the SRAM to lengthen traces (not much, but I might could improve the > delta by 10-20%),
6 inches is approximately 1 ns of delay =3D> your 250 - 750 mils is approximately 40 - 120 ps =3D 0.41% - 1.2% of the timing budget for one way, double that for the round trip...worth keeping track of, but likely not a cause for concern. Clock to output delays and setup time requirements are going to chew up much more of the timing budget.
> 2) Should I try to make the clock line equal the longest non-clock > signal, or leave it at its natural length, which is about midway > (400mil point-to-point)?
Clock line lengths should be matched to other clock line lengths (which is not your situation), not other data signals. Leave it at the natural length.
> 3) With the traces this short, does it still make sense to source > terminate the clock? I'm guessing yes, but the density is getting > pretty high around this thing. >
A better question to ask yourself is "If I find out later that I need to terminate the clock, how the heck am I going to do it since I didn't provision for one?" Viewed that way, the answer should be obvious. Series terminate with a ~22 ohm resistor and you'll not have any worries about signal quality. Since the runs are so short anyway, I'd suggest surface route only for the clock since a fair sized percentage of the route will be surface traces just because of the parts placement you've described so there is no reason not to make it 100% surface, then the only impedance discontinuity is the one via that takes you from top to bottom. Kevin Jennings
On Mar 26, 5:13=A0pm, radarman <jsham...@gmail.com> wrote:
> This isn't strictly a FPGA question, but I figured someone here might > be able to point me in the right direction. > > I am designing a board with an Altera EP3C40 in the 240-pin QFP and a > Cypress CY7C1792 static SRAM in the 100 pin QFP. I would like to > operate the SRAM at 200MHz, so I know the routing needs to be somewhat > careful. (I'm internally "dual-porting" the SRAM, and each port needs > to run at 100MHz) > > Right now, I have the SRAM on the flip-side of the board from the > FPGA. The RAM has a rectangular footprint, which means that some of > the traces are proportionally longer than others, but the routing is > fairly tight, with traces between 250 and 750mils. Naturally, every > signal is going through a via in this design, but the vias are > literally right next to the pad, so the top-level trace is practically > non-existent for most signals. > > The questions are, > 1) Do I need to further tighten these up? I have some room left under > the SRAM to lengthen traces (not much, but I might could improve the > delta by 10-20%), > 2) Should I try to make the clock line equal the longest non-clock > signal, or leave it at its natural length, which is about midway > (400mil point-to-point)? > 3) With the traces this short, does it still make sense to source > terminate the clock? I'm guessing yes, but the density is getting > pretty high around this thing. > > I've only done one other "high speed" design, with a Gig-E PHY, but I > was able to get all of the signals to within +/- 5 mils on that board. > It's also not entirely tested yet, so before I spin another board > running even faster, I'd like to get it right. > > Note, this is a personal project, so I'm trying to avoid BGA's. > > Thanks!
I'm a little surprised that there are no cares about clock versus data from others. CARE! Your synchronous SRAM doesn't have a DLL like SDRAMs. There's one clock that you provide, nothing provided back. The amount of length matching required is determined by your timing budget. You NEED to put together a timing budget to make sure your clock and data are related better than second cousins once removed. My *opinion* is that the traces are so extremely short that ther would be little benefit from terminations or matching any better than what you have. The reality may be that there's no WAY you could get the speed you're looking for if you generate the clock FROM the FPGA without some extra work. I believe what I did in my own sync SRAM hookup was to feed the clock to the memory chip AND take the clock from the I/O back to the internals so the clock-to-out delay wasn't a concern; the "input clock" and data aligned. At least they did before the mapper started routing the signal back through an internal logic path rather than the actual pad. What is your clock source? If you have a 200MHz clock feeding both the SRAM and the FPGA externally, you have a common external reference to work from. Getting the input sampling and clock to out times to behave may be difficult. Timing budget! As for terminations, series terminations are typically used to deal with reflections. You won't get a reflection off 750 mils. But you might get a cleaner clock if there's series impedance (even if it doesn't damp reflections) and/or an AC load impedance for the clock. Resistors are pretty tiny these days; if it's hobbyist stuff, get a microscope or get your nose dangerously close to your soldering iron by using a loop. We used 0402 caps between pads under the balls on a BGA for decoupling, surely you could put an 0402 or perhaps even a "large" 0603 inline near the escapes on your QFP. The timing budget is crucial to your ability to achieve timing for both reads and writes. The data's there in the SRAM data sheet. You need to figure out what you need to make it happen for you. You have clock delays in and out, clock-to-out and setup/hold to deal with as well as absolute delays and delay skew. It can be done but you won't get it working without working the budget first.
Short is always good but often more important at these speeds is
difference in length or propagation time. Keeping the skew down
between signals allows the use of of a single clock shifting mechanism
to make capture on return relatively simple to implement and even to
make a auto training mechanism.

One triick for these types of devices is to have a clock loop driven
by an I/O and returned by a different I/O. This gives something that
can be used as part of a timing lock loop but tracks changes in the
device I/O for voltage and temperature etc..

John Adair
Enterpoint Ltd. - Home of Raggedstone2. The Spartan-6 PCIe Development
Board.

On 26 Mar, 21:13, radarman <jsham...@gmail.com> wrote:
> This isn't strictly a FPGA question, but I figured someone here might > be able to point me in the right direction. > > I am designing a board with an Altera EP3C40 in the 240-pin QFP and a > Cypress CY7C1792 static SRAM in the 100 pin QFP. I would like to > operate the SRAM at 200MHz, so I know the routing needs to be somewhat > careful. (I'm internally "dual-porting" the SRAM, and each port needs > to run at 100MHz) > > Right now, I have the SRAM on the flip-side of the board from the > FPGA. The RAM has a rectangular footprint, which means that some of > the traces are proportionally longer than others, but the routing is > fairly tight, with traces between 250 and 750mils. Naturally, every > signal is going through a via in this design, but the vias are > literally right next to the pad, so the top-level trace is practically > non-existent for most signals. > > The questions are, > 1) Do I need to further tighten these up? I have some room left under > the SRAM to lengthen traces (not much, but I might could improve the > delta by 10-20%), > 2) Should I try to make the clock line equal the longest non-clock > signal, or leave it at its natural length, which is about midway > (400mil point-to-point)? > 3) With the traces this short, does it still make sense to source > terminate the clock? I'm guessing yes, but the density is getting > pretty high around this thing. > > I've only done one other "high speed" design, with a Gig-E PHY, but I > was able to get all of the signals to within +/- 5 mils on that board. > It's also not entirely tested yet, so before I spin another board > running even faster, I'd like to get it right. > > Note, this is a personal project, so I'm trying to avoid BGA's. > > Thanks!
On Mar 26, 11:28=A0pm, John_H <newsgr...@johnhandwork.com> wrote:
> > I'm a little surprised that there are no cares about clock versus data > from others. =A0CARE! >
Guess you didn't read my post where I detailed the estimate of skew to be ~5% of the timing budget (round trip delay)...and as I said, worth keeping track of, but likely to be far overshadowed by clock to output and setup time requirements. Actually I had originally computed it at half that percentage, twas using 10ns rather than 5 ns clock period.
> Your synchronous SRAM doesn't have a DLL like SDRAMs. =A0There's one > clock that you provide, nothing provided back. >
Just like most synchronous logic...back in the old days...one clock, nothing fancy.
> The amount of length matching required is determined by your timing > budget. =A0You NEED to put together a timing budget to make sure your > clock and data are related better than second cousins once removed. >
To quote Eric Bogatin, "plug in the numbers". The largest length difference between clock and data as reported by Radarman is 350 mils which is approximately a 60 ps built-in skew...clock to output delays and setup time requirements of the devices will be the primary concerns since they will be roughly an order of magnitude larger. Even differences in the delay caused by differences in the capacitive loading would probably be more important than the PCB copper delay differences. Kevin Jennings
On Mar 27, 5:09=A0pm, KJ <kkjenni...@sbcglobal.net> wrote:
> On Mar 26, 11:28=A0pm, John_H <newsgr...@johnhandwork.com> wrote: > > > > > I'm a little surprised that there are no cares about clock versus data > > from others. =A0CARE! > > Guess you didn't read my post where I detailed the estimate of skew to > be ~5% of the timing budget (round trip delay)...and as I said, worth > keeping track of, but likely to be far overshadowed by clock to output > and setup time requirements. =A0Actually I had originally computed it at > half that percentage, twas using 10ns rather than 5 ns clock period. > > > Your synchronous SRAM doesn't have a DLL like SDRAMs. =A0There's one > > clock that you provide, nothing provided back. > > Just like most synchronous logic...back in the old days...one clock, > nothing fancy. > > > The amount of length matching required is determined by your timing > > budget. =A0You NEED to put together a timing budget to make sure your > > clock and data are related better than second cousins once removed. > > To quote Eric Bogatin, "plug in the numbers". =A0The largest length > difference between clock and data as reported by Radarman is 350 mils > which is approximately a 60 ps built-in skew...clock to output delays > and setup time requirements of the devices will be the primary > concerns since they will be roughly an order of magnitude larger. > Even differences in the delay caused by differences in the capacitive > loading would probably be more important than the PCB copper delay > differences. > > Kevin Jennings
I have already added a series termination resistor, since it really isn't that big a deal. I pretty much figured it would be necessary, and a 0402 is certainly doable. I'll just have to squeeze it in among all the bypass caps. (normally, I use the back of the board for bypass caps, but the SRAM is sitting right under the FPGA) Also, I have a global clock input nearby I could bring a feedback clock in on. (nearby being the same bank) The clock is sourced by a PLL in the FPGA. The master oscillator is 100MHz, which I double to create the SRAM clock. I'm assuming the idea here is that instead of simply connecting the feedback to the output at the FPGA, I loop the signal back from the SRAM clock input, and use that as the feedback instead? I apologize if that's a stupid question. I'm more of a VHDL/Verilog modeler than a board designer - I'm trying to get my feet wet in hardware.
On Mar 27, 6:09=A0pm, KJ <kkjenni...@sbcglobal.net> wrote:
> > Guess you didn't read my post where I detailed the estimate of skew to > be ~5% of the timing budget (round trip delay)...and as I said, worth > keeping track of, but likely to be far overshadowed by clock to output > and setup time requirements. =A0Actually I had originally computed it at > half that percentage, twas using 10ns rather than 5 ns clock period.
Sorry, I thought you were suggesting the FPGA internal clock that feeds the IOB that feeds the SRAM was all that was needed. The 5% budget does not include the uncertainties for the path from FPGA internals to the outside world. I'm happy to see that radarman's looking to match those delays (and hopefully put together the full timing budget!).
> Just like most synchronous logic...back in the old days...one clock, > nothing fancy.
You've got to get fancy if you can't make the clocks behave. The SRAM has nice Tsu/Th and Tco values published relative to the external clock. The delays in the FPGA are a little more unbalanced despite the femtoscopic sampling window at the input register. Getting the DLLs to shift the right way to line up the small valid window from the SRAM is a female dog. If *you* can get the FPGA's Tco and Tsu/Th to balance properly for the SRAM interface (using those persnickety single-ended IO standards) then I have my hat off to you in a deep bow with respect.
On Mar 27, 9:22=A0pm, radarman <jsham...@gmail.com> wrote:
> > I have already added a series termination resistor, since it really > isn't that big a deal.
<snip>
> Also, I have a global clock input nearby I could bring a feedback > clock in on. (nearby being the same bank)
If you're going to feed the clock back, then you'll want to parallel terminate to ground rather than series terminate at the source. Series termination makes the edge look good at the end of the run, intermediate places along the net look like a step half way up and a plateau and then another step up when the reflection comes back. By feeding the clock back, the SRAM will now be right in the middle of the net rather than at the end and will not have a clean edge (the clean edge will be back at the FPGA where the terminator is located). When you parallel terminate to ground, the signal looks the same virtually everywhere on the net. Again, depending on the edge rates of the FPGA, your runs are short enough that you might not need any termination, so allowing for a terminator is cheap insurance...but you want to have the right insurance. Kevin Jennings
On Mar 28, 1:22=A0pm, KJ <kkjenni...@sbcglobal.net> wrote:
> plateau and then another step up when the reflection comes back. =A0By > feeding the clock back, the SRAM will now be right in the middle of > the net rather than at the end and will not have a clean edge (the > clean edge will be back at the FPGA where the terminator is located). >
Oops, strike the "where the terminator is located" part. Was trying to say that the series terminated clock would be clean at the input pin on the FPGA where it comes back in...NOT at the FPGA output pin where the terminator is located, that would be the location of the worst case plateau. In any case, at the SRAM, which is at the halfway point in the clock net, there will be a plateau. How big of a plateau depends on the edge rate of the FPGA output pin. Kevin Jennings