Xilinx QUIZ 2008 system setup * Xilinx Virtex FPGA - DDR2 memory - SFP sockets on MGTs - Gigabit TEMAC with SG DMA 1G fiber module in the SFP, fiber to D-Link media adapter cable to D-link GB switch cable to PC GB ethernet port PC is running only 1 custom application FPGA is sending UDB packets to PC PC sends very little amount of small UDP packets that are responded by FPGA Problem: UDP commands are no longed processed or responded by the FPGA after say 15 minutes after communication start. the time is dependant off the PC and the application running, there it may be sometimes several hours before the communication stops. Yesterday I did think I solved the problem: the RX buffers were not aligned properly so I assumed that could have caused the problem for the SG DMA. But after fixing this, the problem persisted. The PPC is not running wild, neither is there spurios reset coming, the main loop is still working, and the interrupts as well. But the DMA registers after the failure are written with either 0, random or wrong values. I am troubleshooting this system for some time already, had many great ideas what all could have been the cause for the problem, but non of them made any change. Hum... adding single UART char debug symbols made it NOT TO FAIL (or maybe i did not wait long enough) so i removed those debug printouts, to make the problem visible so it can be better seen. I have mini uart debug routine built in so i can type commands to read memory and DCR bus whenever i want while the system is running. So I see the DMA regs being corrupted but that doesnt give much hints how or why? It looks like TX BD address value has been written to RX LEN register, other regs are either 0, or completly random. Any body dare to propose a solution? Yesterday i belived the answer to be: ALIGN but i was wrong. Antti
Xilinx QUIZ 2008
Started by ●December 30, 2008
Reply by ●December 30, 20082008-12-30
Antti wrote:> Xilinx QUIZ 2008 > > system setup > > * Xilinx Virtex FPGA > - DDR2 memory > - SFP sockets on MGTs > - Gigabit TEMAC with SG DMA >Ok, what I can remember from running in Virtex-4FX issues (information from about one year ago, so maybe that has changed meanwhile) with ethernet: SG-DMA used to be buggy. HardTemac-Version used to be Silicon-Revision dependend (and was not selected properly automatically) Maybe that helps a little. Regards, Lorenz
Reply by ●December 30, 20082008-12-30
On Dec 30, 2:11=A0pm, Lorenz Kolb <lorenz.k...@uni-ulm.de> wrote:> Antti wrote: > > Xilinx QUIZ 2008 > > > system setup > > > * Xilinx Virtex FPGA > > - DDR2 memory > > - SFP sockets on MGTs > > - Gigabit TEMAC with SG DMA > > Ok, what I can remember from running in Virtex-4FX issues (information > from about one year ago, so maybe that has changed meanwhile) with ethern=et:> > SG-DMA used to be buggy. > > HardTemac-Version used to be Silicon-Revision dependend (and was not > selected properly automatically) > > Maybe that helps a little. > > Regards, > > LorenzThank you, well it not very encouraging :( the system uses LL_TEMAC_SGMII_V1_00a (user modified!) and MPMC2 #define GUI_VERSION 1.9 #define PCORE_VERSION _v2_10_a #define pcorename mpmc2_ddr2_pnncc_200mhz_x16_mt47h16m16_3 I know this is rather old and so on, but currently i have no options to upgrade the complete system to MPMC2 4.x anything you recall about the buggy? what did the buggy behavior cause? I think the hw revision is not an issues as the MGT seems to work ok under all circumstances, just the DMA engine DCR registers get wrong values, and then it all stops Antti
Reply by ●December 30, 20082008-12-30
Antti wrote:> On Dec 30, 2:11 pm, Lorenz Kolb <lorenz.k...@uni-ulm.de> wrote: >> Antti wrote: >>> Xilinx QUIZ 2008 >>> system setup >>> * Xilinx Virtex FPGA >>> - DDR2 memory >>> - SFP sockets on MGTs >>> - Gigabit TEMAC with SG DMA >> Ok, what I can remember from running in Virtex-4FX issues (information >> from about one year ago, so maybe that has changed meanwhile) with ethernet: >> >> SG-DMA used to be buggy. >> >> HardTemac-Version used to be Silicon-Revision dependend (and was not >> selected properly automatically) >> >> Maybe that helps a little. >> >> Regards, >> >> Lorenz > > Thank you, well it not very encouraging :( > the system uses > > LL_TEMAC_SGMII_V1_00a (user modified!) > and MPMC2 > #define GUI_VERSION 1.9 > #define PCORE_VERSION _v2_10_a > #define pcorename mpmc2_ddr2_pnncc_200mhz_x16_mt47h16m16_3Ah, sorry, I'm out. Until now I only used the old fashioned way (PLB_Temac + HardTemac) as I didn't need the extra performance of a MPMC, sorry. But please also check Your version of the hard_temac IP-Core: e.g. for the silicon-revision of our ML403s we needed BEGIN hard_temac PARAMETER INSTANCE = hard_temac_0 PARAMETER HW_VER = 3.00.a though EDK encouraged us to use 3.00.b instead... Good luck, anyway, Lorenz
Reply by ●December 31, 20082008-12-31
Hi Antti, Try disabling cache if it is enabled. Try increasing the stack. Also, take a look at the old GSRD reference design using MPMC and LL_TEMAC. It used to work quite reliably but it was long time ago since I tried it last time. /Mikhail
Reply by ●December 31, 20082008-12-31
Antti <Antti.Lukats@googlemail.com> wrote:> PC sends very little amount of small UDP > packets that are responded by FPGAI'm wondering if greatly increasing the volume of packets going from the PC to the FPGA would make the problem reproduce faster, etc. Do you have the flexibility to change the PC side to increase or even flood it with status checks or some noop command? G.
Reply by ●January 2, 20092009-01-02
On Dec 31 2008, 6:57=A0pm, "MM" <mb...@yahoo.com> wrote:> Hi Antti, > > Try disabling cache if it is enabled. > Try increasing the stack. > > Also, take a look at the old GSRD reference design using MPMC and LL_TEMA=C.> It used to work quite reliably but it was long time ago since I tried it > last time. > > /MikhailI-Cache is enabled D-Cache is disabled, but i think the D-Cache invalidate calls are made, so its good idea to remove them (or check they are not called) Antti
Reply by ●January 2, 20092009-01-02
On Dec 31 2008, 11:14=A0pm, ga...@allegro.com (Gavin Scott) wrote:> Antti <Antti.Luk...@googlemail.com> wrote: > > PC sends very little amount of small UDP > > packets that are responded by FPGA > > I'm wondering if greatly increasing the volume of packets going from > the PC to the FPGA would make the problem reproduce faster, etc. =A0Do > you have the flexibility to change the PC side to increase or even > flood it with status checks or some noop command? > > G.it seems to have relation yes, when demo app is running on PC the failure happens in longer time, when the real app is running failure seems to happen earlier. The real app sends more packets to FPGA I have not tried flooding yet, but i have monitored the Rx/Tx buffer descriptor list fill level, when working there is NEVER more than 1 incoming packet in the buffer chain so there is no slow overflow of the buffer descriptor chain Antti
Reply by ●January 4, 20092009-01-04
On Jan 2, 10:20=A0am, Antti <Antti.Luk...@googlemail.com> wrote:> On Dec 31 2008, 11:14=A0pm, ga...@allegro.com (Gavin Scott) wrote: > > > Antti <Antti.Luk...@googlemail.com> wrote: > > > PC sends very little amount of small UDP > > > packets that are responded by FPGA > > > I'm wondering if greatly increasing the volume of packets going from > > the PC to the FPGA would make the problem reproduce faster, etc. =A0Do > > you have the flexibility to change the PC side to increase or even > > flood it with status checks or some noop command? > > > G. > > it seems to have relation yes, when demo app is running on PC > the failure happens in longer time, when the real app is running > failure seems to happen earlier. The real app sends more > packets to FPGA > > I have not tried flooding yet, but i have monitored the Rx/Tx > buffer descriptor list fill level, when working there is NEVER > more than 1 incoming packet in the buffer chain > so there is no slow overflow of the buffer descriptor chain > > AnttiI hope I have finally found the real issue... a few days ago i had a "ISSUE LIST" in excel table where i note the possible issues, their probability, methods of testing, etc.. the table had 26 items. but one VERY important item was missing, something that should always be on the list: "stupid software bug" how could i had it missing on my list? the original software is not written by me, neither it is very good or robust or tested but.. it has been reported as working 100% in some occasions, so i assumed there is no systematic problem with it. (all assumptions are to be considered false) but, the RX BD list is initialized once!!! ONCE!! the software does not write the buflen any more after the initialization so the BD list gets dirty and is never cleaned/released. DMA will write num_received into buflen (what was previously set 2048) this buflen is after that no longer modified, neither the stats field i truly hope this is the problem. if not then next item to check on my list is DCM chaining introduced jitter making some unexplained odd behaviour for the MPMC/DMA/DDR2... i hope it is not the DCM jitter problem. Antti PS and if somebody thinks i should have seen it earlier? i compared some of the code with Xilinx example code and there was also no BD reinit, so i did not check deeper in the drivers. But the drivers cant do that part, so the code is really just missing.
Reply by ●January 4, 20092009-01-04
>a few days ago i had a "ISSUE LIST" in excel table >where i note the possible issues, their probability, methods of >testing, etc.. > >the table had 26 items. > >but one VERY important item was missing, >something that should always be on the list: > >"stupid software bug"Don't overlook the smart software bugs. Many years ago, as a project was wrapping up, I made a list of the places where a bug could come from. I wish I had saved a copy. The list included bugs in microcode bugs in microcode assembler bugs in data sheet The one that I would have missed if I hadn't done it: bugs in my reading of a datasheet -- These are my opinions, not necessarily my employer's. I hate spam.





