comp.arch.fpga | PCB routing issues for sync SRAM

This isn't strictly a FPGA question, but I figured someone here might
be able to point me in the right direction.

I am designing a board with an Altera EP3C40 in the 240-pin QFP and a
Cypress CY7C1792 static SRAM in the 100 pin QFP. I would like to
operate the SRAM at 200MHz, so I know the routing needs to be somewhat
careful. (I'm internally "dual-porting" the SRAM, and each port needs
to run at 100MHz)

Right now, I have the SRAM on the flip-side of the board from the
FPGA. The RAM has a rectangular footprint, which means that some of
the traces are proportionally longer than others, but the routing is
fairly tight, with traces between 250 and 750mils. Naturally, every
signal is going through a via in this design, but the vias are
literally right next to the pad, so the top-level trace is practically
non-existent for most signals.

The questions are,
1) Do I need to further tighten these up? I have some room left under
the SRAM to lengthen traces (not much, but I might could improve the
delta by 10-20%),
2) Should I try to make the clock line equal the longest non-clock
signal, or leave it at its natural length, which is about midway
(400mil point-to-point)?
3) With the traces this short, does it still make sense to source
terminate the clock? I'm guessing yes, but the density is getting
pretty high around this thing.

I've only done one other "high speed" design, with a Gig-E PHY, but I
was able to get all of the signals to within +/- 5 mils on that board.
It's also not entirely tested yet, so before I spin another board
running even faster, I'd like to get it right.

Note, this is a personal project, so I'm trying to avoid BGA's.

Thanks!

Reply by -jg ●March 26, 20102010-03-26

On Mar 27, 9:13=A0am, radarman <jsham...@gmail.com> wrote:
> This isn't strictly a FPGA question, but I figured someone here might
> be able to point me in the right direction.
>
> I am designing a board with an Altera EP3C40 in the 240-pin QFP and a
> Cypress CY7C1792 static SRAM in the 100 pin QFP. I would like to
> operate the SRAM at 200MHz, so I know the routing needs to be somewhat
> careful. (I'm internally "dual-porting" the SRAM, and each port needs
> to run at 100MHz)
>
> Right now, I have the SRAM on the flip-side of the board from the
> FPGA. The RAM has a rectangular footprint, which means that some of
> the traces are proportionally longer than others, but the routing is
> fairly tight, with traces between 250 and 750mils. Naturally, every
> signal is going through a via in this design, but the vias are
> literally right next to the pad, so the top-level trace is practically
> non-existent for most signals.
>
> The questions are,
> 1) Do I need to further tighten these up? I have some room left under
> the SRAM to lengthen traces (not much, but I might could improve the
> delta by 10-20%),
> 2) Should I try to make the clock line equal the longest non-clock
> signal, or leave it at its natural length, which is about midway
> (400mil point-to-point)?
> 3) With the traces this short, does it still make sense to source
> terminate the clock? I'm guessing yes, but the density is getting
> pretty high around this thing.
>
> I've only done one other "high speed" design, with a Gig-E PHY, but I
> was able to get all of the signals to within +/- 5 mils on that board.
> It's also not entirely tested yet, so before I spin another board
> running even faster, I'd like to get it right.
>
> Note, this is a personal project, so I'm trying to avoid BGA's.
>
> Thanks!

 You can reality check this with a Trace-delay ballpark of "150 ps/
inch and 190 ps/inch".
 The clock signal is always the most important, and I have seen
designs generate a CLK, & !CLK to lower EMC.
 Everything else should change of the other edge, so balancing only
gets critical on very tight time budgets.

Reply by KJ ●March 26, 20102010-03-26

On Mar 26, 5:13=A0pm, radarman <jsham...@gmail.com> wrote:
>
> The questions are,
> 1) Do I need to further tighten these up? I have some room left under
> the SRAM to lengthen traces (not much, but I might could improve the
> delta by 10-20%),

6 inches is approximately 1 ns of delay =3D> your 250 - 750 mils is
approximately 40 - 120 ps =3D 0.41% - 1.2% of the timing budget for one
way, double that for the round trip...worth keeping track of, but
likely not a cause for concern.  Clock to output delays and setup time
requirements are going to chew up much more of the timing budget.

> 2) Should I try to make the clock line equal the longest non-clock
> signal, or leave it at its natural length, which is about midway
> (400mil point-to-point)?

Clock line lengths should be matched to other clock line lengths
(which is not your situation), not other data signals.  Leave it at
the natural length.

> 3) With the traces this short, does it still make sense to source
> terminate the clock? I'm guessing yes, but the density is getting
> pretty high around this thing.
>

A better question to ask yourself is "If I find out later that I need
to terminate the clock, how the heck am I going to do it since I
didn't provision for one?"  Viewed that way, the answer should be
obvious.

Series terminate with a ~22 ohm resistor and you'll not have any
worries about signal quality.

Since the runs are so short anyway, I'd suggest surface route only for
the clock since a fair sized percentage of the route will be surface
traces just because of the parts placement you've described so there
is no reason not to make it 100% surface, then the only impedance
discontinuity is the one via that takes you from top to bottom.

Kevin Jennings

Reply by John_H ●March 27, 20102010-03-27

On Mar 26, 5:13=A0pm, radarman <jsham...@gmail.com> wrote:
> This isn't strictly a FPGA question, but I figured someone here might
> be able to point me in the right direction.
>
> I am designing a board with an Altera EP3C40 in the 240-pin QFP and a
> Cypress CY7C1792 static SRAM in the 100 pin QFP. I would like to
> operate the SRAM at 200MHz, so I know the routing needs to be somewhat
> careful. (I'm internally "dual-porting" the SRAM, and each port needs
> to run at 100MHz)
>
> Right now, I have the SRAM on the flip-side of the board from the
> FPGA. The RAM has a rectangular footprint, which means that some of
> the traces are proportionally longer than others, but the routing is
> fairly tight, with traces between 250 and 750mils. Naturally, every
> signal is going through a via in this design, but the vias are
> literally right next to the pad, so the top-level trace is practically
> non-existent for most signals.
>
> The questions are,
> 1) Do I need to further tighten these up? I have some room left under
> the SRAM to lengthen traces (not much, but I might could improve the
> delta by 10-20%),
> 2) Should I try to make the clock line equal the longest non-clock
> signal, or leave it at its natural length, which is about midway
> (400mil point-to-point)?
> 3) With the traces this short, does it still make sense to source
> terminate the clock? I'm guessing yes, but the density is getting
> pretty high around this thing.
>
> I've only done one other "high speed" design, with a Gig-E PHY, but I
> was able to get all of the signals to within +/- 5 mils on that board.
> It's also not entirely tested yet, so before I spin another board
> running even faster, I'd like to get it right.
>
> Note, this is a personal project, so I'm trying to avoid BGA's.
>
> Thanks!

I'm a little surprised that there are no cares about clock versus data
from others.  CARE!

Your synchronous SRAM doesn't have a DLL like SDRAMs.  There's one
clock that you provide, nothing provided back.

The amount of length matching required is determined by your timing
budget.  You NEED to put together a timing budget to make sure your
clock and data are related better than second cousins once removed.

My *opinion* is that the traces are so extremely short that ther would
be little benefit from terminations or matching any better than what
you have.  The reality may be that there's no WAY you could get the
speed you're looking for if you generate the clock FROM the FPGA
without some extra work.  I believe what I did in my own sync SRAM
hookup was to feed the clock to the memory chip AND take the clock
from the I/O back to the internals so the clock-to-out delay wasn't a
concern; the "input clock" and data aligned.  At least they did before
the mapper started routing the signal back through an internal logic
path rather than the actual pad.

What is your clock source?  If you have a 200MHz clock feeding both
the SRAM and the FPGA externally, you have a common external reference
to work from.  Getting the input sampling and clock to out times to
behave may be difficult.  Timing budget!

As for terminations, series terminations are typically used to deal
with reflections.  You won't get a reflection off 750 mils.  But you
might get a cleaner clock if there's series impedance (even if it
doesn't damp reflections) and/or an AC load impedance for the clock.
Resistors are pretty tiny these days; if it's hobbyist stuff, get a
microscope or get your nose dangerously close to your soldering iron
by using a loop.  We used 0402 caps between pads under the balls on a
BGA for decoupling, surely you could put an 0402 or perhaps even a
"large" 0603 inline near the escapes on your QFP.

The timing budget is crucial to your ability to achieve timing for
both reads and writes.  The data's there in the SRAM data sheet.  You
need to figure out what you need to make it happen for you.  You have
clock delays in and out, clock-to-out and setup/hold to deal with as
well as absolute delays and delay skew.  It can be done but you won't
get it working without working the budget first.

Reply by John Adair ●March 27, 20102010-03-27

Short is always good but often more important at these speeds is
difference in length or propagation time. Keeping the skew down
between signals allows the use of of a single clock shifting mechanism
to make capture on return relatively simple to implement and even to
make a auto training mechanism.

One triick for these types of devices is to have a clock loop driven
by an I/O and returned by a different I/O. This gives something that
can be used as part of a timing lock loop but tracks changes in the
device I/O for voltage and temperature etc..

John Adair
Enterpoint Ltd. - Home of Raggedstone2. The Spartan-6 PCIe Development
Board.

On 26 Mar, 21:13, radarman <jsham...@gmail.com> wrote:
> This isn't strictly a FPGA question, but I figured someone here might
> be able to point me in the right direction.
>
> I am designing a board with an Altera EP3C40 in the 240-pin QFP and a
> Cypress CY7C1792 static SRAM in the 100 pin QFP. I would like to
> operate the SRAM at 200MHz, so I know the routing needs to be somewhat
> careful. (I'm internally "dual-porting" the SRAM, and each port needs
> to run at 100MHz)
>
> Right now, I have the SRAM on the flip-side of the board from the
> FPGA. The RAM has a rectangular footprint, which means that some of
> the traces are proportionally longer than others, but the routing is
> fairly tight, with traces between 250 and 750mils. Naturally, every
> signal is going through a via in this design, but the vias are
> literally right next to the pad, so the top-level trace is practically
> non-existent for most signals.
>
> The questions are,
> 1) Do I need to further tighten these up? I have some room left under
> the SRAM to lengthen traces (not much, but I might could improve the
> delta by 10-20%),
> 2) Should I try to make the clock line equal the longest non-clock
> signal, or leave it at its natural length, which is about midway
> (400mil point-to-point)?
> 3) With the traces this short, does it still make sense to source
> terminate the clock? I'm guessing yes, but the density is getting
> pretty high around this thing.
>
> I've only done one other "high speed" design, with a Gig-E PHY, but I
> was able to get all of the signals to within +/- 5 mils on that board.
> It's also not entirely tested yet, so before I spin another board
> running even faster, I'd like to get it right.
>
> Note, this is a personal project, so I'm trying to avoid BGA's.
>
> Thanks!

Reply by KJ ●March 27, 20102010-03-27

On Mar 26, 11:28=A0pm, John_H <newsgr...@johnhandwork.com> wrote:
>
> I'm a little surprised that there are no cares about clock versus data
> from others. =A0CARE!
>

Guess you didn't read my post where I detailed the estimate of skew to
be ~5% of the timing budget (round trip delay)...and as I said, worth
keeping track of, but likely to be far overshadowed by clock to output
and setup time requirements.  Actually I had originally computed it at
half that percentage, twas using 10ns rather than 5 ns clock period.

> Your synchronous SRAM doesn't have a DLL like SDRAMs. =A0There's one
> clock that you provide, nothing provided back.
>

Just like most synchronous logic...back in the old days...one clock,
nothing fancy.

> The amount of length matching required is determined by your timing
> budget. =A0You NEED to put together a timing budget to make sure your
> clock and data are related better than second cousins once removed.
>

To quote Eric Bogatin, "plug in the numbers".  The largest length
difference between clock and data as reported by Radarman is 350 mils
which is approximately a 60 ps built-in skew...clock to output delays
and setup time requirements of the devices will be the primary
concerns since they will be roughly an order of magnitude larger.
Even differences in the delay caused by differences in the capacitive
loading would probably be more important than the PCB copper delay
differences.

Kevin Jennings

Reply by radarman ●March 27, 20102010-03-27

On Mar 27, 5:09=A0pm, KJ <kkjenni...@sbcglobal.net> wrote:
> On Mar 26, 11:28=A0pm, John_H <newsgr...@johnhandwork.com> wrote:
>
>
>
> > I'm a little surprised that there are no cares about clock versus data
> > from others. =A0CARE!
>
> Guess you didn't read my post where I detailed the estimate of skew to
> be ~5% of the timing budget (round trip delay)...and as I said, worth
> keeping track of, but likely to be far overshadowed by clock to output
> and setup time requirements. =A0Actually I had originally computed it at
> half that percentage, twas using 10ns rather than 5 ns clock period.
>
> > Your synchronous SRAM doesn't have a DLL like SDRAMs. =A0There's one
> > clock that you provide, nothing provided back.
>
> Just like most synchronous logic...back in the old days...one clock,
> nothing fancy.
>
> > The amount of length matching required is determined by your timing
> > budget. =A0You NEED to put together a timing budget to make sure your
> > clock and data are related better than second cousins once removed.
>
> To quote Eric Bogatin, "plug in the numbers". =A0The largest length
> difference between clock and data as reported by Radarman is 350 mils
> which is approximately a 60 ps built-in skew...clock to output delays
> and setup time requirements of the devices will be the primary
> concerns since they will be roughly an order of magnitude larger.
> Even differences in the delay caused by differences in the capacitive
> loading would probably be more important than the PCB copper delay
> differences.
>
> Kevin Jennings

I have already added a series termination resistor, since it really
isn't that big a deal. I pretty much figured it would be necessary,
and a 0402 is certainly doable. I'll just have to squeeze it in among
all the bypass caps. (normally, I use the back of the board for bypass
caps, but the SRAM is sitting right under the FPGA)

Also, I have a global clock input nearby I could bring a feedback
clock in on. (nearby being the same bank) The clock is sourced by a
PLL in the FPGA. The master oscillator is 100MHz, which I double to
create the SRAM clock. I'm assuming the idea here is that instead of
simply connecting the feedback to the output at the FPGA, I loop the
signal back from the SRAM clock input, and use that as the feedback
instead?

 I apologize if that's a stupid question. I'm more of a VHDL/Verilog
modeler than a board designer - I'm trying to get my feet wet in
hardware.

Reply by John_H ●March 28, 20102010-03-28

On Mar 27, 6:09=A0pm, KJ <kkjenni...@sbcglobal.net> wrote:
>
> Guess you didn't read my post where I detailed the estimate of skew to
> be ~5% of the timing budget (round trip delay)...and as I said, worth
> keeping track of, but likely to be far overshadowed by clock to output
> and setup time requirements. =A0Actually I had originally computed it at
> half that percentage, twas using 10ns rather than 5 ns clock period.

Sorry, I thought you were suggesting the FPGA internal clock that
feeds the IOB that feeds the SRAM was all that was needed.  The 5%
budget does not include the uncertainties for the path from FPGA
internals to the outside world.  I'm happy to see that radarman's
looking to match those delays (and hopefully put together the full
timing budget!).

> Just like most synchronous logic...back in the old days...one clock,
> nothing fancy.

You've got to get fancy if you can't make the clocks behave.  The SRAM
has nice Tsu/Th and Tco values published relative to the external
clock.  The delays in the FPGA are a little more unbalanced despite
the femtoscopic sampling window at the input register.  Getting the
DLLs to shift the right way to line up the small valid window from the
SRAM is a female dog.  If *you* can get the FPGA's Tco and Tsu/Th to
balance properly for the SRAM interface (using those persnickety
single-ended IO standards) then I have my hat off to you in a deep bow
with respect.

Reply by KJ ●March 28, 20102010-03-28

On Mar 27, 9:22=A0pm, radarman <jsham...@gmail.com> wrote:
>
> I have already added a series termination resistor, since it really
> isn't that big a deal.
<snip>
> Also, I have a global clock input nearby I could bring a feedback
> clock in on. (nearby being the same bank)

If you're going to feed the clock back, then you'll want to parallel
terminate to ground rather than series terminate at the source.
Series termination makes the edge look good at the end of the run,
intermediate places along the net look like a step half way up and a
plateau and then another step up when the reflection comes back.  By
feeding the clock back, the SRAM will now be right in the middle of
the net rather than at the end and will not have a clean edge (the
clean edge will be back at the FPGA where the terminator is located).

When you parallel terminate to ground, the signal looks the same
virtually everywhere on the net.

Again, depending on the edge rates of the FPGA, your runs are short
enough that you might not need any termination, so allowing for a
terminator is cheap insurance...but you want to have the right
insurance.

Kevin Jennings

Reply by KJ ●March 28, 20102010-03-28

On Mar 28, 1:22=A0pm, KJ <kkjenni...@sbcglobal.net> wrote:
> plateau and then another step up when the reflection comes back. =A0By
> feeding the clock back, the SRAM will now be right in the middle of
> the net rather than at the end and will not have a clean edge (the
> clean edge will be back at the FPGA where the terminator is located).
>

Oops, strike the "where the terminator is located" part.  Was trying
to say that the series terminated clock would be clean at the input
pin on the FPGA where it comes back in...NOT at the FPGA output pin
where the terminator is located, that would be the location of the
worst case plateau.

In any case, at the SRAM, which is at the halfway point in the clock
net, there will be a plateau.  How big of a plateau depends on the
edge rate of the FPGA output pin.

Kevin Jennings

Previous12 3 Next

PCB routing issues for sync SRAM

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group