FPGARelated.com
Forums

Altera Stratix IV GX Dev.Kit: PCI-E x4 device doesn't work in x8 slot

Started by Michael S November 14, 2009
Hi
We have a problem with Altera Stratix IV GX FPGA Development Kit.
Specifically, we build a PCI Express design based on Altera's own
"hard" PCI-E core configured for Gen.1 x4 operation. The design works
(more or less, but that's behind the scope of this message) when it is
plugged into x8 mechanical/x4 electrical slot. However when plugged in
x8 or x16 mechanical slots which are electrically x8 the design not
only doesn't work but not even recognized by the host as valid PCI-E
device. Exactly the same happens when we are trying to build x1
device.
We validated (by plugging off-the-shelf x1 and x4 PCI-E cards) that
it's not a host issue.
My only PCI-E book (Mindshare "PCI Express System Architecture") tells
virtually nothing about width negotiation so right now I am totally
lost.

Any ideas to help?
On Nov 14, 3:18=A0pm, Michael S <already5cho...@yahoo.com> wrote:
> Hi > We have a problem with Altera Stratix IV GX FPGA Development Kit. > Specifically, we build a PCI Express design based on Altera's own > "hard" PCI-E core configured for Gen.1 x4 operation. The design works > (more or less, but that's behind the scope of this message) when it is > plugged into x8 mechanical/x4 electrical slot. However when plugged in > x8 or x16 mechanical slots which are electrically x8 the design not > only doesn't work but not even recognized by the host as valid PCI-E > device. Exactly the same happens when we are trying to build x1 > device. > We validated (by plugging off-the-shelf x1 and x4 PCI-E cards) that > it's not a host issue. > My only PCI-E book (Mindshare "PCI Express System Architecture") tells > virtually nothing about width negotiation so right now I am totally > lost. > > Any ideas to help?
I'm guessing a bit, but... That card has an x8 PHY on it. So the motherboard thinks that it is an x8 card, and link aggregation fails. The motherboard (typically) decides that it's an x8 card by doing a receiver detect. All the receiver detect does is check for a load on the drivers. To test this out, mask off all of the receivers for the unused (last 4) channels. If I'm right (it could happen) the motherboard won't see the receivers, and aggregate as an x4. RK
On Nov 17, 8:10=A0pm, d_s_klein <d_s_kl...@yahoo.com> wrote:
> On Nov 14, 3:18=A0pm, Michael S <already5cho...@yahoo.com> wrote: > > > > > > > Hi > > We have a problem with Altera Stratix IV GX FPGA Development Kit. > > Specifically, we build a PCI Express design based on Altera's own > > "hard" PCI-E core configured for Gen.1 x4 operation. The design works > > (more or less, but that's behind the scope of this message) when it is > > plugged into x8 mechanical/x4 electrical slot. However when plugged in > > x8 or x16 mechanical slots which are electrically x8 the design not > > only doesn't work but not even recognized by the host as valid PCI-E > > device. Exactly the same happens when we are trying to build x1 > > device. > > We validated (by plugging off-the-shelf x1 and x4 PCI-E cards) that > > it's not a host issue. > > My only PCI-E book (Mindshare "PCI Express System Architecture") tells > > virtually nothing about width negotiation so right now I am totally > > lost. > > > Any ideas to help?
I talked to two of our PCIe experts, and they suggested checking three things: The DIP switch that controls how many lanes are supported via the PCIe connector presence detect settings could be set wrong. It is SW5 on the board, see table 2-15 in the board reference manual http://www.altera.com/literature/manual/rm_sivgx_fpga_dev_board.pdf. If it is an Intel motherboard in some cases they send out Vendor Defined messages. The customer=92s application design needs to be designed to ignore these messages (unless the customer is Intel then their application might need to know what these messages are and do the right thing). If the application design doesn=92t accept these messages from the core it will lock things up and cause configuration problems. If you have Engineering Sample (not production) silicon, it could be a case of the Stratix IV GX ES erratum entitled =93Endpoints Using the Hard IP Implementation Incorrectly Handle CfgRd0=94 as described in the IP Release notes: http://www.altera.com/literature/rn/rn_ip.pdf. Workaround for this is to use production devices or use soft IP (v9.0 or later) on ES devices. Hope this helps, Vaughn Betz Altera [v b e t z (at) altera.com]
On Nov 20, 4:56 am, Vaughn <vaughnb...@gmail.com> wrote:
> On Nov 17, 8:10 pm, d_s_klein <d_s_kl...@yahoo.com> wrote: > > > > > On Nov 14, 3:18 pm, Michael S <already5cho...@yahoo.com> wrote: > > > > Hi > > > We have a problem with Altera Stratix IV GX FPGA Development Kit. > > > Specifically, we build a PCI Express design based on Altera's own > > > "hard" PCI-E core configured for Gen.1 x4 operation. The design works > > > (more or less, but that's behind the scope of this message) when it i=
s
> > > plugged into x8 mechanical/x4 electrical slot. However when plugged i=
n
> > > x8 or x16 mechanical slots which are electrically x8 the design not > > > only doesn't work but not even recognized by the host as valid PCI-E > > > device. Exactly the same happens when we are trying to build x1 > > > device. > > > We validated (by plugging off-the-shelf x1 and x4 PCI-E cards) that > > > it's not a host issue. > > > My only PCI-E book (Mindshare "PCI Express System Architecture") tell=
s
> > > virtually nothing about width negotiation so right now I am totally > > > lost. > > > > Any ideas to help? > > I talked to two of our PCIe experts, and they suggested checking three > things: > > The DIP switch that controls how many lanes are supported via the PCIe > connector presence detect settings could be set wrong. It is SW5 on > the board, see table 2-15 in the board reference manualhttp://www.altera.=
com/literature/manual/rm_sivgx_fpga_dev_board.pdf.
>
DIP switch appears to have no effect at all. That is, I can set it into x1-only position and host/board still sometimes negotiate to x4. Or I can set it to x1+x4+x8 and it sometimes correctly detects x4. the key word here - sometimes. BTW, what are those presence detects? What are they supposed to do? Is it something dev-kit specific, Altera-specific or standard?
> If it is an Intel motherboard in some cases they send out Vendor > Defined messages. The customer=92s application design needs to be > designed to ignore these messages (unless the customer is Intel then > their application might need to know what these messages are and do > the right thing). If the application design doesn=92t accept these > messages from the core it will lock things up and cause configuration > problems. >
Yes, it is an Intel board - brand new S3420GPLC. I also tested on the other Intel board - 4 years desktop based on 915-series chipset. It misbehaves in a similar manner. With a bit of effort I could find some Dell or Asus or may be Supermicro to test but all of them are based on Intel chipsets so it probably wouldn't make difference. Finding testing platform not based on Intel chipset would present a serious challenge. Besides, even if I'd find one - it's not going to help since it _has_ to work on Intel at the end. So tell me more about how exactly my application could ignore Vendor Defined messages. Please keep in mind that I am using Avalon-MM variant of the PCIe core so I don't have too much control on what's going on under the hood. One more point - with soft PCIe IP it misbehaves slightly less (and differently) than with hard IP but misbehave nevertheless. BTW, even in the slot which is x4 electrically things are not rosy. Quite often in that slot board is detected as x1 or x2 instead of x4. When detected as x2 it tends to not work at all, when detected as x1 it behaves better (not good, just better). Anyway I _need_ x4. Were I wanted x1 I'd rather build from cheap PLX bridge + cheap StartixIII - combo that "just works".
> If you have Engineering Sample (not production) silicon, it could be a > case of the Stratix IV GX ES erratum entitled =93Endpoints Using the > Hard IP Implementation Incorrectly Handle CfgRd0=94 as described in the > IP Release notes:http://www.altera.com/literature/rn/rn_ip.pdf. > Workaround for this is to use production devices or use soft IP (v9.0 > or later) on ES devices.
How do I know whether it is Engineering Sample or production device?
> > Hope this helps,
Yes, it does, thanks. But more help needed. In fact I am starting to suspect that the kit we have is physically damaged. Would be real shame if it is the case - so much time already wasted, but better that than not finding solution at all.
> > Vaughn Betz > Altera > [v b e t z (at) altera.com]
On Nov 20, 4:56 am, Vaughn <vaughnb...@gmail.com> wrote:
> On Nov 17, 8:10 pm, d_s_klein <d_s_kl...@yahoo.com> wrote: > > > Hope this helps, > > Vaughn Betz > Altera > [v b e t z (at) altera.com]
If we are talking already - one more question, may be related or may be not. I measured PCIe read latency from host to zero-latency avalon-mm slave that lives in pcie_clock_out clock domain. To my big surprise the latency was absolutely huge - around 1050 ns for hard IP and 880 ns for soft IP. I expected much shorter latency - 250 ns, at wost 300. The measurements were done in the PCIe slot directly attached to Xeon 3400 CPU so from the host perspective it's probably the fastest configuration in current existence. Why is read so slow? Is it (hopefully) an another sign of hardware problem? Or is Altera ST-to-MM converter so slow (I find it hard to believe, according to my estimate that particular part of the loop should contribute about 50 ns, if not less)? Or Altera PCIe IPs themselves are poorly suited for serving host read access?
Vaughn, where are you?

Couple of updates since last time:
1. Our kit is indeed based on engineering sample.
2. QuartusII 9.1 appears to have exactly the same problems as 9.0 SP2
that we used before.

[Whining on]
BTW, do you know that PCIe core v. 9.1 is not 100% source code
compatible with v.9.0? I though that the whole point of _minor_
version number that it's supposed to be backward compatible :( Next
time you break backward compatibility,  pleas increment the major
version number then, at least, we would know that the trouble is
coming.
Plus, "soft" IP is not 100% source code compatible with the hard IP
and the differences are more than just test inputs/output. It's pretty
annoying.
[Whining off]

Regards,
Michael