FPGARelated.com
Forums

PCIe latency

Started by Unknown November 6, 2006
Hi

Im trying to design a high speed data capture card. Im using a Lattice
ECP2M-50 FPGA with the one-board SERDES units (MGBT in Xilinx
dtasheets). Im using a MSPS Nation ADC. This dual ADC has a output of
1Gb/s and thus the combined x4 lane PCIe will match this rate. HOWEVER,
if there is latency on the PCIe bus more than 100us then my RAM inside
the FPGA will overflow. I need to know bus latency between TLP's
because i need to know if I require external RAM or if the design in
possible!

Can someone please help me!!??

Thanks
Jason

On 2006-11-06, slkjas@gmail.com <slkjas@gmail.com> wrote:
> Hi > > Im trying to design a high speed data capture card. Im using a Lattice > ECP2M-50 FPGA with the one-board SERDES units (MGBT in Xilinx > dtasheets). Im using a MSPS Nation ADC. This dual ADC has a output of > 1Gb/s and thus the combined x4 lane PCIe will match this rate. HOWEVER, > if there is latency on the PCIe bus more than 100us then my RAM inside > the FPGA will overflow. I need to know bus latency between TLP's > because i need to know if I require external RAM or if the design in > possible!
First of all, this post assumes you mean Gigabyte/s (GB/s) since you are talking about matching the rate with a 4 lane PCIe configuration. I see at least one real problem here: PCIe has a link speed of 2.5 GHz per lane. After 8B/10B decoding you will have 250 MB/s. With 4 lanes you will get 1000 MB/s. (Exactly what you need.) Unfortunately you will have protocol overhead. This means that you will not even theoretically be able to push data in the speed required by your application (Assuming that you truly need 1 GB/s and not say for example 950 MB/s.) Perhaps you can design the card so that in case of emergency (your PCIe host cannot accept your packets fast enough) you will reduce the precision of your samples from 8 bits (which I assume you use) to 4 bits and somehow tell your application that the precision is not enough. But quite a lot depends on your application. For example: 1. Do you just want to store as much data as you can fit into your main memory? 2. Do you intend to do some sort of real time processing on the data in your host? 3. As mentioned earlier, do you truly need exactly 1GB/s? In that case, you will need more than 4 lanes... If 1, perhaps you can use 7 bits / sample instead to reduce the required bandwidth. If 2, perhaps you can do some processing on the data in the FPGA to reduce the bandwidth. As for latency, my guess is that the latency should be far less than 100us. But personally I would not feel very safe unless I had some guarantee from the host system that a certain bandwidth to main memory was reserved for my PCIe card. (At least if the required bandwidth is very close to the theoretical maximum when using maximum sized packets.) I guess your google skills are as good as mine, but I could point out http://nowlab.cse.ohio-state.edu/publications/journal-papers/2005/liuj-ieeemicro05.pdf where the latency and bandwidth of PCI express based Infiniband HCA:s are tested. The latency of a small message is around 3.8 us in this case so from that point of view, 100us should be more than enough. /Andreas
Hi Andreas

Thanks for the reply.

I thought about the problem yesterday.

Originally the objective was to do on board poly phase filtering to
reduce the bandwidth and then send the data to be real-time processed
on the PC.

The card now is just used to capture data and supposidly STREAM it to
the PCIe bus. The PCIe controller would then use its direct memory
access controller to store the data in the PC RAM.  Linux/Windows would
run 'real time' processing on this data.

I have re-read the data sheet of the ADC and have found that at best
the Effective Number of Bits is 7.2.  Therefore i can do what you
suggested and just reduce the bandwidth in this manner to 7 bits.

The problem was the protocol overhead, but with buffering (im using the
FPGA on board blocks RAMs and an external QDR SRAM) and only using 7
ENOB i think i will be able to confortably stream the data to the PCIe
bus. I have also spoken to my Prof. and he says that what we can also
do is reduce the sampling rate and thus make the input data rate
approximately 800MB/s which would be perfect.

Can you forsee any other problems?

Thanks a million
Jason




Andreas Ehliar wrote:
> On 2006-11-06, slkjas@gmail.com <slkjas@gmail.com> wrote: > > Hi > > > > Im trying to design a high speed data capture card. Im using a Lattice > > ECP2M-50 FPGA with the one-board SERDES units (MGBT in Xilinx > > dtasheets). Im using a MSPS Nation ADC. This dual ADC has a output of > > 1Gb/s and thus the combined x4 lane PCIe will match this rate. HOWEVER, > > if there is latency on the PCIe bus more than 100us then my RAM inside > > the FPGA will overflow. I need to know bus latency between TLP's > > because i need to know if I require external RAM or if the design in > > possible! > > > First of all, this post assumes you mean Gigabyte/s (GB/s) since you are > talking about matching the rate with a 4 lane PCIe configuration. > > > I see at least one real problem here: > > PCIe has a link speed of 2.5 GHz per lane. After 8B/10B decoding you > will have 250 MB/s. With 4 lanes you will get 1000 MB/s. (Exactly what > you need.) > > Unfortunately you will have protocol overhead. This means that you > will not even theoretically be able to push data in the speed required > by your application (Assuming that you truly need 1 GB/s and not say > for example 950 MB/s.) > > Perhaps you can design the card so that in case of emergency (your > PCIe host cannot accept your packets fast enough) you will reduce > the precision of your samples from 8 bits (which I assume you use) > to 4 bits and somehow tell your application that the precision is > not enough. > > > But quite a lot depends on your application. For example: > 1. Do you just want to store as much data as you can fit into your main > memory? > 2. Do you intend to do some sort of real time processing on the data > in your host? > 3. As mentioned earlier, do you truly need exactly 1GB/s? In that case, > you will need more than 4 lanes... > > If 1, perhaps you can use 7 bits / sample instead to reduce the required > bandwidth. If 2, perhaps you can do some processing on the data in the FPGA > to reduce the bandwidth. > > > As for latency, my guess is that the latency should be far less than 100us. > But personally I would not feel very safe unless I had some guarantee from > the host system that a certain bandwidth to main memory was reserved for > my PCIe card. (At least if the required bandwidth is very close to the > theoretical maximum when using maximum sized packets.) > > I guess your google skills are as good as mine, but I could point out > http://nowlab.cse.ohio-state.edu/publications/journal-papers/2005/liuj-ieeemicro05.pdf > where the latency and bandwidth of PCI express based Infiniband HCA:s are > tested. The latency of a small message is around 3.8 us in this case so > from that point of view, 100us should be more than enough. > > /Andreas