Reply by vcar May 16, 20092009-05-16
> We've done a number of PCIe designs and take the conservative approach of > using parallel FLASH to insure the FPGA configures under the PCI spec reset > time. However, we've found that to be too conservative as the PC goes > through many reset cycles when it powers up. The BIOS will bring it out of > reset and configure the bus then the a PC will go thru another reset cycle > before the OS comes up.
So I should not worry about the SPI Flash configuration speed at all ?
> Since BW is a concern in your design, you'll want to make sure you have a > DMA engine on your endpoint device and make sure its supports scatter > gather as both Windows and Linux may only provide small memory ranges for > where to stuff your data on the PC side. DMA is required because transfers > originating on the PC side will be broken into single cycle accesses. To do > bursting of data, the transfers will need to originate in the endpoint.
I read the application note XAPP1052, and the BUS MASTER DMA design could achieve the bandwidth of 6912Mbps card to PC and 5440Mbps PC to card in the PCIe X4 configuration. XAPP1052 does not implement Scatter Gather DMA, and the performance is already acceptable. Although the test is taken in small data volumn (32768 bytes), larger datasets could be divided into multiples of 32768 bytes. Will the bandwidth reduce largely in my application, say the incoming data at the rate of 500Mbytes/s without the Scatter Gather DMA mechanism?
Reply by Mives May 15, 20092009-05-15
>Another problem is that I want to configure the FPGA through an SPI >Flash. Could FPGA be configured successfully before the PC powers up >and enumerates the device? >
We've done a number of PCIe designs and take the conservative approach of using parallel FLASH to insure the FPGA configures under the PCI spec reset time. However, we've found that to be too conservative as the PC goes through many reset cycles when it powers up. The BIOS will bring it out of reset and configure the bus then the a PC will go thru another reset cycle before the OS comes up. Since BW is a concern in your design, you'll want to make sure you have a DMA engine on your endpoint device and make sure its supports scatter gather as both Windows and Linux may only provide small memory ranges for where to stuff your data on the PC side. DMA is required because transfers originating on the PC side will be broken into single cycle accesses. To do bursting of data, the transfers will need to originate in the endpoint. mike.ives@plexus.com
Reply by vcar May 15, 20092009-05-15
Another problem is that I want to configure the FPGA through an SPI
Flash. Could FPGA be configured successfully before the PC powers up
and enumerates the device?
Reply by vcar May 13, 20092009-05-13
On May 13, 12:10=A0am, Rob Gaddi <rga...@technologyhighland.com> wrote:
> On Tue, 12 May 2009 07:54:52 -0700 (PDT) > > > > > > vcar <hi...@163.com> wrote: > > I want to capture data from an ADC and send to PC though PCI-E. I > > prefer using the Virtex5 XC5VLX20T-FF323 to implement the PCI-E Lane > > X4 interface. I need real time data capturing, and the input data rate > > is about 500Mbytes per second. Although the PCI-E X4 could provide a > > data channel of about 800Mbytes per second (based on the Xilinx ML555 > > PCI-E X4 experiment data), I am afraid that if the host PC has other > > PCI-E devices(say video card or another data capture card), and the > > data transfer performance would be affected. Therefore I need proper > > data buffering: > > > A. Use Block RAM to buffer captured data. The buffer is quite > > small, I could use twenty 36kBRAM at most, equivalent to 20X2Kbytes =3D > > 40Kbytes. I do not know whether this buffering depth is enough or > > not. If the PCI-E bus is very busy or other device is occupying for a > > long time, the buffer will overflow. > > > B. Use external DDR2 memory to buffer captured data. The > > buffer depth is not a problem, but I need to implement an 32bit DDR2 > > interface, and the free user I/O of FF323 package is not enough. I > > have to choose XC5VLX30T-FF665, which will increase the cost largely. > > > Now I am eager to prove that 20~40Kbytes buffer depth is enough for my > > application. However I check the PCI-E v1.1 spec for several days, > > this is not such conclusion or experiment data. > > > Please give me some suggestions. Thanks a lot! > > PCIe is a point-to-point link, not a shared bus. =A0You own the four > lanes coming up to your board entirely, no time sharing. > > The problem is that you don't know whether those lines go directly back > to your northbridge/RAM controller or whether they're going through a > PCIe-PCIe switch that you're sharing with other devices. =A0But that's > not anything you'll be able to find in the PCIe documentation, that's > specific to the actual motherboard you're plugged into. > > -- > Rob Gaddi, Highland Technology > Email address is currently out of order- Hide quoted text - > > - Show quoted text -
Thank you. Since there are always some uncertainties in real world, and I could not assume that north bridge is always free for capturing data. The FPGA BRAM buffering may be not sufficient enough. And the best safe method is to add the DDR2 for buffering. Is this right?
Reply by Rob Gaddi May 12, 20092009-05-12
On Tue, 12 May 2009 07:54:52 -0700 (PDT)
vcar <hitsx@163.com> wrote:

> I want to capture data from an ADC and send to PC though PCI-E. I > prefer using the Virtex5 XC5VLX20T-FF323 to implement the PCI-E Lane > X4 interface. I need real time data capturing, and the input data rate > is about 500Mbytes per second. Although the PCI-E X4 could provide a > data channel of about 800Mbytes per second (based on the Xilinx ML555 > PCI-E X4 experiment data), I am afraid that if the host PC has other > PCI-E devices(say video card or another data capture card), and the > data transfer performance would be affected. Therefore I need proper > data buffering: > > A. Use Block RAM to buffer captured data. The buffer is quite > small, I could use twenty 36kBRAM at most, equivalent to 20X2Kbytes = > 40Kbytes. I do not know whether this buffering depth is enough or > not. If the PCI-E bus is very busy or other device is occupying for a > long time, the buffer will overflow. > > B. Use external DDR2 memory to buffer captured data. The > buffer depth is not a problem, but I need to implement an 32bit DDR2 > interface, and the free user I/O of FF323 package is not enough. I > have to choose XC5VLX30T-FF665, which will increase the cost largely. > > Now I am eager to prove that 20~40Kbytes buffer depth is enough for my > application. However I check the PCI-E v1.1 spec for several days, > this is not such conclusion or experiment data. > > Please give me some suggestions. Thanks a lot!
PCIe is a point-to-point link, not a shared bus. You own the four lanes coming up to your board entirely, no time sharing. The problem is that you don't know whether those lines go directly back to your northbridge/RAM controller or whether they're going through a PCIe-PCIe switch that you're sharing with other devices. But that's not anything you'll be able to find in the PCIe documentation, that's specific to the actual motherboard you're plugged into. -- Rob Gaddi, Highland Technology Email address is currently out of order
Reply by vcar May 12, 20092009-05-12
I want to capture data from an ADC and send to PC though PCI-E. I
prefer using the Virtex5 XC5VLX20T-FF323 to implement the PCI-E Lane
X4 interface. I need real time data capturing, and the input data rate
is about 500Mbytes per second. Although the PCI-E X4 could provide a
data channel of about 800Mbytes per second (based on the Xilinx ML555
PCI-E X4 experiment data), I am afraid that if the host PC has other
PCI-E devices(say video card or another data capture card), and the
data transfer performance would be affected. Therefore I need proper
data buffering:

A.	Use Block RAM to buffer captured data. The buffer is quite small, I
could use twenty 36kBRAM at most, equivalent to 20X2Kbytes = 40Kbytes.
I do not know whether this buffering depth is enough or not. If the
PCI-E bus is very busy or other device is occupying for a long time,
the buffer will overflow.

B.	Use external DDR2 memory to buffer captured data. The buffer depth
is not a problem, but I need to implement an 32bit DDR2 interface, and
the free user I/O of FF323 package is not enough. I have to choose
XC5VLX30T-FF665, which will increase the cost largely.

Now I am eager to prove that 20~40Kbytes buffer depth is enough for my
application. However I check the PCI-E v1.1 spec for several days,
this is not such conclusion or experiment data.

Please give me some suggestions. Thanks a lot!