Alex,
I was wondering if you made anymore progress with the PCI Express DMA problem. I have a similar problem but it is concerning bursting of data from the host to the Endpoint. My Windows Driver sets up a buffer of data to be sent to the endpoint and initiates a block transfer. The chipset, however, breaks this block into multiple single DW transfers effectively killing performance. I believe that allowing the Endpoint to become the bus master and initiate block transfers by reading from the allocated buffer on the host will lead to better bus utilization. Do you have any ideas about this or any updates on your progress with DMA?
Thanks --Kevin
Reply by Mark McDougall●May 10, 20062006-05-10
Antti wrote:
> easier ride? how much easier?
As I said:
>> If you're doing windows and need a 'grass-roots' high performance
>> driver, prepare yourself for a frustrating and challenging time.
> PS actually linux device drivers are fun, I agree, but quick dirty
> direct hardware programming on WinXP is simple as well.
There's several options these days to make life a lot easier on Windows,
for example the Jungo tools, TVICPCI etc. But to some extent it depends
on what type of driver you're writing, what performance you need, and
what versions of windows you need to support.
A big part of the time/effort is simply ramping up on windows device
drivers - working out what *type* of driver you need to write (is it
WDM? native kernel mode? VxD? upper/lower filter? HID?) - sometimes you
even need 2 drivers! - and how it fits into the whole mess.
Years and years ago I spent *months* writing a SCSI miniport driver for
95/NT4/2K, which included support calls to M$. Once I'd finished, it
took me 3 days get a basic port running on Linux, and I'd *never*
written a Linux device driver before.
Regards,
--
Mark McDougall, Engineer
Virtual Logic Pty Ltd, <http://www.vl.com.au>
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266
Reply by Antti●May 10, 20062006-05-10
easier ride? how much easier?
just yesterday I wrote test application that allocates system dma
buffer and sends the physical address of it to the pci target that then
starts master transaction.
the PCI backend logic needed about 20 lines of verilog
for the WinXP test application I wrote about 15 lines of Delphi code
you say on linux it would be easier?
well if you have linux box in your bedroom then maybe :)
Antti
PS actually linux device drivers are fun, I agree, but quick dirty
direct hardware programming on WinXP is simple as well.
Reply by Mark McDougall●May 9, 20062006-05-09
Mark McDougall wrote:
> SongDragon wrote:
>
>> 1) device driver (let's say for linux 2.6.x) requests some
BTW if you're writing Linux device drivers as opposed to Windows
drivers, you're in for a *much* easier ride! :)
Regards,
--
Mark McDougall, Engineer
Virtual Logic Pty Ltd, <http://www.vl.com.au>
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266
Reply by Mark McDougall●May 9, 20062006-05-09
SongDragon wrote:
> 1) device driver (let's say for linux 2.6.x) requests some
(snip snip)
> writes a zero to a register ("serviced descriptor"), telling the PCIe
> device the interrupt has been fielded.
> I have a number of questions regarding this. First and foremost, is
> this view of the transaction correct? Is this actually "bus
> mastering"? It seems like for PCIe, since there is no "bus", there is
> no additional requirements to handle other devices "requesting" the
> bus. So I shouldn't have to perform any bus arbitration (listen in to
> see if any of the other INT pins are being triggered, etc). Is this
> assumption correct?
Your description of events is pretty much correct. The exact registers
and sequencing will of course depend on your implementation of a DMA
controller.
You'll need a source register too unless the data is being supplied by a
FIFO or I/O "pipe" on the device.
"Bus mastering" is a PCI term and refers to the ability to initiate a
PCI transfer - which also implies the capability to request the bus.
In PCIe nomenclature, an entity that can initiate a transfer is referred
to as a "requestor" and you're right, there's no arbitration involved as
such. But this is the equivalent of a PCI bus master I suppose. The
target of the request is called the "completer".
This is where my knowledge of PCIe becomes thinner, as I'm currently in
the process of ramping up for a PCIe project myself. But I have worked
on several PCI projects so I think my foundations are valid.
For example, using a (bus-mastering) PCI core you wouldn't have to
'worry about' requesting the bus etc - initiating a request via the
back-end of the core would trigger that functionality in the core
transparently for you. As far as your device is concerned, you have
"exclusive" use of the bus - you may just have to wait a bit to get to
use it (and you may get interrupted occasionally). Arbitration etc is
not your problem.
> In PCI Express, you have to specify a bunch of things in the TLP
> header, including bus #, device #, function #, and tag. I'm not sure
> what these values should be. If the CPU were requesting a MEMREAD32,
> the values for these fields in the MEMREAD32_COMPLETION response
> would would be set to the same values as were included in the
> MEMREAD32. However, since the PCIe device is actually sending out a
> MEMWRITE32 command, the values for these fields are not clear to me.
This is where I'll have to defer to others...
Regards,
--
Mark McDougall, Engineer
Virtual Logic Pty Ltd, <http://www.vl.com.au>
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266
Reply by SongDragon●May 9, 20062006-05-09
Thanks for the helpful responses from everyone.
The basic idea seems to be as follows:
1) device driver (let's say for linux 2.6.x) requests some kernel-level
physical memory
2) device driver performs MEMWRITE32 (length = 1) to a register
("destination descriptor") on the PCIe device, setting destination address
in the memory
3) device driver performs MEMWRITE32 (length = 1) to a register ("length
descriptor") on the PCIe device, setting length "N" (We'll say this also
signals "GO")
4) PCIe device sends MEMWRITE32s (each length = up to 128 bytes at a time)
to _______ (what is the destination?) until length N is reached
5) PCIe device sends interrupt (for now, let's say INTA ... it could be MSI,
though)
6) device driver services interrupt and writes a zero to a register
("serviced descriptor"), telling the PCIe device the interrupt has been
fielded.
I have a number of questions regarding this. First and foremost, is this
view of the transaction correct? Is this actually "bus mastering"? It seems
like for PCIe, since there is no "bus", there is no additional requirements
to handle other devices "requesting" the bus. So I shouldn't have to perform
any bus arbitration (listen in to see if any of the other INT pins are being
triggered, etc). Is this assumption correct?
In PCI Express, you have to specify a bunch of things in the TLP header,
including bus #, device #, function #, and tag. I'm not sure what these
values should be. If the CPU were requesting a MEMREAD32, the values for
these fields in the MEMREAD32_COMPLETION response would would be set to the
same values as were included in the MEMREAD32. However, since the PCIe
device is actually sending out a MEMWRITE32 command, the values for these
fields are not clear to me.
Thanks,
--Alex
Reply by Mark McDougall●May 8, 20062006-05-08
SongDragon wrote:
> I am looking for some assistance writing a driver and FPGA code to
> handle DMA on a PCI Express system. The FPGA is a Xilinx V2P with a
> Xilinx x4 PCIe LogiCORE (v3.0).
Assuming the LogiCORE is capable of bus mastering, you need to
instantiate a 'DMA controller' in your FPGA; either your own design or
borrowed from another source.
A 'DMA controller' can simply be a set of registers (sometimes referred
to as 'descriptors') mapped into the host address space that allow the
software to set a DMA transfer - source address, destination address,
transfer size, control/status etc - hit a 'GO' bit, and generate an
interrupt when it's done. If you want to get more fancy, add multiple
channels, scatter-gather descriptors, request queuing, etc.
From the back side of the PCIe core, all the DMA controller does is
request the bus and issue a standard (burst in PCI-land) read/write
to/from the source/destination addresses in the register. PCIe itself
has no concept of 'DMA' - all it sees is another PCIe transfer.
Exactly how you establish the transfer in the core is dependent on the
backend interface of the LogiCORE. You shouldn't have to worry about the
format of the TLP at all if there's a decent backend interface.
> Are there any reference designs /
> sample code available?
A DMA controller IP core for PCI would still illustrate the concepts and
give some insight into what you're up for. At the risk of muddying the
waters further, there's a wishbone DMA core on opencores which can
ultimately be used for PCI DMA transfers when connected to a PCI core
(the opencores PCI core is a wishbone bridge so it bolts straight on).
Might even be worth just looking at the doco for it.
As for the driver, that will depend on what class of device you're
implementing, especially if you're talking about windows. Your best bet
there is to find an open-source/example driver for a similar device. If
you're doing windows and need a 'grass-roots' high performance driver,
prepare yourself for a frustrating and challenging time.
Regards,
--
Mark McDougall, Engineer
Virtual Logic Pty Ltd, <http://www.vl.com.au>
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266
Reply by John_H●May 8, 20062006-05-08
"SongDragon" <songdrgn@g.m.a.i.l.n0.spam.com> wrote in message
news:VbOdnQfqqqqJwsLZnZ2dneKdnZydnZ2d@comcast.com...
>I am looking for some assistance writing a driver and FPGA code to handle
>DMA on a PCI Express system. The FPGA is a Xilinx V2P with a Xilinx x4 PCIe
>LogiCORE (v3.0).
>
> I've scoured through the entire PCI Express Base Specification v2.0 (the
> Solari/Intel book) and DMA isn't mentioned once, as far as I can tell. I
> suppose it is at a higher level than the base spec covers. The Xilinx
> manuals don't mention it, either. I've also googled everywhere (websites,
> groups, etc.) for mention of PCI Express and DMA, to no avail.
>
> Where should I go to find out how PCI Express handles DMA? What should the
> TLP messages look like? Are there any reference designs / sample code
> available?
>
> I look forward to hearing from the community about this issue.
>
> Thank you,
>
> --Alex Gross
The DMA isn't done by the PCI express - it's done by the surrounding layers.
The PCI, PCI-X, PCIe all have the ability to be a Master in a Burst
transaction. For your FPGA to DMA to another system, the FPGA needs a
request to master a transaction issued to the core. Once granted, the
transaction will specify the location for the data transfer which has to be
coordinated in your system, not in the PCIe core. The transaction can
provide a complete payload or may be interrupted (at least in PCI/X land) to
allow other higher-priority transactions to occur.
Look at mastering transactions and post again with further questions.
Reply by Jerry Coffin●May 8, 20062006-05-08
In article
<VbOdnQfqqqqJwsLZnZ2dneKdnZydnZ2d@comcast.com>,
songdrgn@g.m.a.i.l.n0.spam.com says...
[ ... ]
> I've scoured through the entire PCI Express Base Specification v2.0 (the
> Solari/Intel book) and DMA isn't mentioned once, as far as I can tell. I
> suppose it is at a higher level than the base spec covers. The Xilinx
> manuals don't mention it, either. I've also googled everywhere (websites,
> groups, etc.) for mention of PCI Express and DMA, to no avail.
PCI (express or otherwise) doesn't really support DMA as
such. Looking for bus mastering is much more likely to
get you useful information.
--
Later,
Jerry.
The universe is a figment of its own imagination.
Reply by SongDragon●May 8, 20062006-05-08
I am looking for some assistance writing a driver and FPGA code to handle
DMA on a PCI Express system. The FPGA is a Xilinx V2P with a Xilinx x4 PCIe
LogiCORE (v3.0).
I've scoured through the entire PCI Express Base Specification v2.0 (the
Solari/Intel book) and DMA isn't mentioned once, as far as I can tell. I
suppose it is at a higher level than the base spec covers. The Xilinx
manuals don't mention it, either. I've also googled everywhere (websites,
groups, etc.) for mention of PCI Express and DMA, to no avail.
Where should I go to find out how PCI Express handles DMA? What should the
TLP messages look like? Are there any reference designs / sample code
available?
I look forward to hearing from the community about this issue.
Thank you,
--Alex Gross