FPGARelated.com
Forums

Virtex-4FX embeded MAC and Rocket-IO data corruption??

Started by Marc Kelly June 19, 2006
Hi,

After a very enjoyable few days trying to sort out a test design
involving Ethernet and Virtex-4 I thought it time to ask some advice.

I have a FX60's embedded MAC using RocketIO to talk 1000Base-X to an SFP
module(cat5 copper) this is then cross-over cable connected to an Intel
E1000 network card. There are no switches involved in the links. The
design is based on the 1.4.1 reference design that comes with coregen
8.1 for the embedded-mac. I am not using the Host Interface or such
like, just a basic logic hooked on the end of the MAC with it statically
cconfigured using the tie-offs.

I can get the PC with Intel card to send raw Ethernet frames to my
virtex-4's mac address and they seem to get through about 40% of the
time, and I see the "frame good" signal, the rest of the time I see a
"bad frame". On looking at more signals I am seeing a large amount of
disparity error and NotInTable coming from the RocketIO module and what
looks like corruption or bit shifts on the data.

I even see this when there is no traffic on the link and there are just
idle frames being sent.. It seems semi-repetitive, with say the order of
a few hundred good bytes between small bursts of bad ones.

Anyone got any ideas?

I have checked the network card, the cable(long ones, short ones, good
quality ones and bad ones.) All cards and cables pass large statistic
back-to-back testing at 1Gig between PC's with very few dropped frames.

My initial thoughts were the reference clock being fed to the RocketIO
(which i think is okay, its the ref clock from a PCIexpress socket going
via a ICS9DB202 to clean it up and make it 125Mhz.) I have considered
heat too, so we bolted a larger heatsink and fan onto the chip...

Words of wisdom will be greatfully received...
--
/\/\arc Kelly
..Just your average physicist trying to get by in a world full of normal
people...


Marc Kelly wrote:
> Hi, > > After a very enjoyable few days trying to sort out a test design > involving Ethernet and Virtex-4 I thought it time to ask some advice. > > I have a FX60's embedded MAC using RocketIO to talk 1000Base-X to an SFP > module(cat5 copper) this is then cross-over cable connected to an Intel > E1000 network card. There are no switches involved in the links. The > design is based on the 1.4.1 reference design that comes with coregen > 8.1 for the embedded-mac. I am not using the Host Interface or such > like, just a basic logic hooked on the end of the MAC with it statically > cconfigured using the tie-offs.
Have you seen the release notes and applied all the fixes ? Also, I don't know about 1000BaseX, but with SGMII, you need a MDIO clock ... either internal (by configuring the divider with the host internal) or external (via the mdio clock in pin). Without it, I never could get the SGMII to autonegotiate witht the PHY ...
> I can get the PC with Intel card to send raw Ethernet frames to my > virtex-4's mac address and they seem to get through about 40% of the > time, and I see the "frame good" signal, the rest of the time I see a > "bad frame". On looking at more signals I am seeing a large amount of > disparity error and NotInTable coming from the RocketIO module and what > looks like corruption or bit shifts on the data. > > I even see this when there is no traffic on the link and there are just > idle frames being sent.. It seems semi-repetitive, with say the order of > a few hundred good bytes between small bursts of bad ones. > > Anyone got any ideas?
I'm working with EMAC and SGMII, and I had errors because my sgmii board already has AC coupling capacitors in the rx path (at the phy end), so with the on-chip ac coupling, the signal was just too attenuated. Switching off the internal ac coupling did the trick.
> My initial thoughts were the reference clock being fed to the RocketIO > (which i think is okay, its the ref clock from a PCIexpress socket going > via a ICS9DB202 to clean it up and make it 125Mhz.) I have considered > heat too, so we bolted a larger heatsink and fan onto the chip...
If it's from a PCIexpress socket, are you sure it's spread spectrum is deactivated ? If not, even with a pll, for a few ms, it will be quite off 125MHz ... Sylvain
Marc,
Have you designed the board with the Xilinx on in house ?
After checking the quality of the reference clock I would look at the power 
supply to the MGTs and the quality of the PCB routing between the Xilinx and 
the SFP. Any way you can look at the RocketIO signals and check the eye 
pattern?

/MikeJ
(x-physicist)

"Marc Kelly" <marc@achenar.eclipse.co.uk> wrote in message 
news:tfidnWvgXbi3lgrZRVny1Q@eclipse.net.uk...
> Hi, > > After a very enjoyable few days trying to sort out a test design > involving Ethernet and Virtex-4 I thought it time to ask some advice. > > I have a FX60's embedded MAC using RocketIO to talk 1000Base-X to an SFP > module(cat5 copper) this is then cross-over cable connected to an Intel > E1000 network card. There are no switches involved in the links. The > design is based on the 1.4.1 reference design that comes with coregen > 8.1 for the embedded-mac. I am not using the Host Interface or such > like, just a basic logic hooked on the end of the MAC with it statically > cconfigured using the tie-offs. > > I can get the PC with Intel card to send raw Ethernet frames to my > virtex-4's mac address and they seem to get through about 40% of the > time, and I see the "frame good" signal, the rest of the time I see a > "bad frame". On looking at more signals I am seeing a large amount of > disparity error and NotInTable coming from the RocketIO module and what > looks like corruption or bit shifts on the data. > > I even see this when there is no traffic on the link and there are just > idle frames being sent.. It seems semi-repetitive, with say the order of > a few hundred good bytes between small bursts of bad ones. > > Anyone got any ideas? > > I have checked the network card, the cable(long ones, short ones, good > quality ones and bad ones.) All cards and cables pass large statistic > back-to-back testing at 1Gig between PC's with very few dropped frames. > > My initial thoughts were the reference clock being fed to the RocketIO > (which i think is okay, its the ref clock from a PCIexpress socket going > via a ICS9DB202 to clean it up and make it 125Mhz.) I have considered > heat too, so we bolted a larger heatsink and fan onto the chip... > > Words of wisdom will be greatfully received... > -- > /\/\arc Kelly > ..Just your average physicist trying to get by in a world full of normal > people... > >
MikeJ wrote:

> Have you designed the board with the Xilinx on in house ?
It's actually a prototyping board bought from PLDA (their XpressFX60 board) with the SFP sockets already on it. They're main selling point is its PCIexpress capability, but we're after it for the high speed IO currently.
> Any way you can look at the RocketIO signals and check the eye > pattern?
Its possible I think, would have to get our hardware people onto it, as I do mostly firmware and so they keep all very high spec scopes hidden from me :) -- /\/\arc Kelly ..Just your average physicist trying to get by in a world full of normal people...
Sylvain Munaut wrote:

> Have you seen the release notes and applied all the fixes ?
Yes, I had been hoping that one of them would magically fix things, sadly not.
> Also, I don't know about 1000BaseX, but with SGMII, you need a MDIO > clock ... Without it, I never could get the SGMII to autonegotiate > witht the PHY ...
I believe things are negotiating, although I may be wrong. The fact I am seeing real fames that work, and the system can echo them back to the PC as well gave me some confidence. I will check the MDIO clock issue however..
>> Anyone got any ideas?
> I'm working with EMAC and SGMII, and I had errors because my sgmii board > already has AC coupling capacitors in the rx path (at the phy end), so > with the on-chip ac coupling, the signal was just too attenuated. > Switching off the internal ac coupling did the trick.
Ah, that does sound interesting, I will have to check he schematics for the board(s) tomorrow at work and see.. It does seem to be the kind of thing that might be causing it.. if so, then I owe you many beers...
> If it's from a PCIexpress socket, are you sure it's spread spectrum is > deactivated ?
yeah, we turned off all the spread spectrum settings in a moment of inspiration.. sadly didn't seem to have any effects. I even tried clocking it from a DCM generated 125Mhz clock, just to see what happened.. same effect as the proper ref clock. -- /\/\arc Kelly ..Just your average physicist trying to get by in a world full of normal people...
> > It's actually a prototyping board bought from PLDA (their XpressFX60 > board) with the SFP sockets already on it. They're main selling point is > its PCIexpress capability, but we're after it for the high speed IO > currently.
Ok, I know of that board - they should know what they are doing and I assume they have looked at the eye pattern.
> >> Any way you can look at the RocketIO signals and check the eye >> pattern? > > Its possible I think, would have to get our hardware people onto it, as > I do mostly firmware and so they keep all very high spec scopes hidden > from me :) >
I know that feeling - having had some colleagues break/misplace expensive probes :) Maybe it's worth replacing the reference clock with a quality low jitter differential oscillator and see if it makes any difference ? You can get away with some carefully matched length twisted pair mod wire - my company does it quite often ... but make sure the oscillator power supply is good - and put a smd cap across the pins of the oscillator at least. /MikeJ
> Maybe it's worth replacing the reference clock with a quality low jitter > differential oscillator and see if it makes any difference ? > You can get away with some carefully matched length twisted pair mod > wire - my company does it quite often ... but make sure the oscillator > power supply is good - and put a smd cap across the pins of the oscillator > at least.
Just to clarify, that is a cap across the power pins of the oscillator! /Mike
MikeJ wrote:
>> Maybe it's worth replacing the reference clock with a quality low jitter >> differential oscillator and see if it makes any difference ?
The board actually has a spare place for mounting an oscillator that feeds into one of the MGT reference clocks, I need to check to see if it feeds the correct column to be used to drive the RocketIO I need.. 'tis the joys of playing with such fun hardware I guess.. -- /\/\arc Kelly ..Just your average physicist trying to get by in a world full of normal people...
Well this is a long shot, but is it possible that one is fixed at Full
Duplex
(no-negotiate) while the other is trying to negotiate and falls-back to
half-duplex?   This is a fairly common problem that occurs and the
link appears to work for simple 'pings', but any real traffic has
massive amounts of errors. This is due to the fact that one is
in full duplex and transmits while receiving and the half-duplex
connection sees this as a collision.

Just a possibility... and I'm only SURE that this happens with
10baseTX and 100baseTX, not with 1000baseX

-bh

"Marc Kelly" <marc@achenar.eclipse.co.uk> wrote in message
news:x-idnS_g2JVsgwrZRVnytQ@eclipse.net.uk...
> MikeJ wrote: > >> Maybe it's worth replacing the reference clock with a quality low
jitter
> >> differential oscillator and see if it makes any difference ? > > The board actually has a spare place for mounting an oscillator that > feeds into one of the MGT reference clocks, I need to check to see if it > feeds the correct column to be used to drive the RocketIO I need.. > > 'tis the joys of playing with such fun hardware I guess.. > -- > /\/\arc Kelly > ..Just your average physicist trying to get by in a world full of normal > people...
bh wrote:
> Well this is a long shot, but is it possible that one is fixed at Full > Duplex (no-negotiate) while the other is trying to negotiate and falls-back to > half-duplex?
I have tried with both ends forced t full-duplex, just incase. and made a new crossover cable too, just incase. I had some good success with turning off the internal ac-coupling caps as someone mentioned, and things look more sane. For small packets ~65-128 bytes long I get good transmission with maybe 1-2% packet loss, larger packets seem to be a problem however. With an idle link I see the /K28.5/D16.2/ idle pattern fine, but sometimes the /D16.2/ is corrupt, and gives a "notintable" error from the MGT, always with what seems to be the same pattern. I need to get Synplicity's Identify_debugger to play nicely tomorrow with a nice long sample memory to check how regular this is happening. The external logic analyser I have access to currently doesn't have the depth when running at a decent speed. Maybe a possible issue with the MGT itself? I can move to another one i think and test that. -- /\/\arc Kelly ..Just your average physicist trying to get by in a world full of normal people...