Reply by Mark McDougall July 26, 20062006-07-26
johnp wrote:

> If you decide to use CDBG from probo.com to play > with your PCI design, you should note that the version > on the web only works with Win98 and earlier.
Yes, but if you're constantly re-configuring your FPGA w/PCI core, then you'll be re-booting constantly as well. The most time-efficient way I've found to do bring-up of a PCI core, or even back-end peripherals, is to have *DOS* booting off your HDD and run CDBG from there.
> CDBG has two modes of operation: > - one mode has the tradition peek/poke commands > - one mode is a C interpreter that lets you write C code > without screwing around with DPMI, etc.
Yes, the C interpreter is quite nice for 'scripting' tests. I brought up the opencores IDE controller with opencores DMA and opencores PCI using C code to do both PIO and DMA IDE accesses. The nice part was the fact that I could transcribe the C code almost line-for-line into Verilog for the equivalent HDL testbench routines. And for further re-usability, I ended up using the OCIDE core in a NIOS-based design (no PCI), which allowed me then to use the original CDBG C code almost unchanged as test routines for that project too! So thanks John, you've saved a lot of time for me at least! Regards, -- Mark McDougall, Engineer Virtual Logic Pty Ltd, <http://www.vl.com.au> 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266
Reply by johnp July 26, 20062006-07-26
If you decide to use CDBG from probo.com to play
with your PCI design, you should note that the version
on the web only works with Win98 and earlier.

I haven't released a Win2K++ version for free usage yet,
but we do have one (and a Linux version as well) that are used
internally.

CDBG has two modes of operation:
  - one mode has the tradition peek/poke commands
  - one mode is a C interpreter that lets you write C code
    without screwing around with DPMI, etc.

ALSO - I suspect it will be next to impossible to get 54MB/sec
transfer rates without bus mastering.

John Providenza

Mark McDougall wrote:
> Brian McFarland wrote: > > > Well I gave up on trying to find free ( and legal :-/ ) info about PCI > > online and ordered the mindshare PCI book. It hasn't arrived yet, but > > I began just writing my own PCI module. I was kinda hoping to be able > > to do this project w/o getting too deep into the specs of PCI, but I > > don't think that's going to happen. > > I'd suggest hooking up the opencores PCI core if you have available > hardware just to get a feel for what's involved. Once you get the gist > of how it hangs together it's really quite simple to hook up something > to the back end. The DMA controller shouldn't be that difficult either > (although I realise I'm speaking with the benefit of hindsight). > > BTW I'd suggest you look into CDBG from probo.com when bringing up a PCI > core. > > From there, you could invest a little time in benchmarking your > application. Even if you end up deciding that the opencores PCI core is > not the way to go, you've no doubt (a) learned something about PCI and > (b) established a peformance testbench for your final solution. > > BTW the Mindshare book is certainly going to be a big help in ramping up > on PCI. > > IMHO, you're going to need bus-mastering DMA to get 54MB/s out of PCI, > and that's a *lot* of effort to do from scratch! Just verifying the > design is going to be a mammoth effort - take a look at the size of the > testbench module in the opencores PCI design to get an idea!!! > > Regards, > > -- > Mark McDougall, Engineer > Virtual Logic Pty Ltd, <http://www.vl.com.au> > 21-25 King St, Rockdale, 2216 > Ph: +612-9599-3255 Fax: +612-9599-3266
Reply by Mark McDougall July 26, 20062006-07-26
Brian McFarland wrote:

> Well I gave up on trying to find free ( and legal :-/ ) info about PCI > online and ordered the mindshare PCI book. It hasn't arrived yet, but > I began just writing my own PCI module. I was kinda hoping to be able > to do this project w/o getting too deep into the specs of PCI, but I > don't think that's going to happen.
I'd suggest hooking up the opencores PCI core if you have available hardware just to get a feel for what's involved. Once you get the gist of how it hangs together it's really quite simple to hook up something to the back end. The DMA controller shouldn't be that difficult either (although I realise I'm speaking with the benefit of hindsight). BTW I'd suggest you look into CDBG from probo.com when bringing up a PCI core. From there, you could invest a little time in benchmarking your application. Even if you end up deciding that the opencores PCI core is not the way to go, you've no doubt (a) learned something about PCI and (b) established a peformance testbench for your final solution. BTW the Mindshare book is certainly going to be a big help in ramping up on PCI. IMHO, you're going to need bus-mastering DMA to get 54MB/s out of PCI, and that's a *lot* of effort to do from scratch! Just verifying the design is going to be a mammoth effort - take a look at the size of the testbench module in the opencores PCI design to get an idea!!! Regards, -- Mark McDougall, Engineer Virtual Logic Pty Ltd, <http://www.vl.com.au> 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266
Reply by Brian McFarland July 26, 20062006-07-26
Well I gave up on trying to find free ( and legal :-/ ) info about PCI
online and ordered the mindshare PCI book.  It hasn't arrived yet, but
I began just writing my own PCI module.  I was kinda hoping to be able
to do this project w/o getting too deep into the specs of PCI, but I
don't think that's going to happen.


Mark McDougall wrote:
> Brian McFarland wrote: > > > -- although i'm not sure how much processing it will take because our > > customers are writing the software that does it and I have no direct > > way to contanct their developers. > > In that case my response would be that I don't have sufficient detail in > the requirements to propose a solution. > > Regards, > > -- > Mark McDougall, Engineer > Virtual Logic Pty Ltd, <http://www.vl.com.au> > 21-25 King St, Rockdale, 2216 > Ph: +612-9599-3255 Fax: +612-9599-3266
Reply by Mark McDougall July 24, 20062006-07-24
Brian McFarland wrote:

> -- although i'm not sure how much processing it will take because our > customers are writing the software that does it and I have no direct > way to contanct their developers.
In that case my response would be that I don't have sufficient detail in the requirements to propose a solution. Regards, -- Mark McDougall, Engineer Virtual Logic Pty Ltd, <http://www.vl.com.au> 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266
Reply by Mark McDougall July 24, 20062006-07-24
Brian McFarland wrote:

> It's an I/O interface that will be constantly receiving and > transmitting something at 270 Mbps both directions using 8B/10B > encoding. Which means potentially, we could want the card to transmit > and receive 27 MB/s. However, in this particular application, rate of > the real data will be closer to just 2MB/s. If there's a latency due > to buffering & block tranfers, it's probably not a concern as long I > can have large enough FIFOs on the FPGA that they never become empty > while I'm filling the PC side buffer. The whole reason for this > interface is to modify the input data stream and send it back out and > the delay caused by CPU time is probably going to be considerably more > -- although i'm not sure how much processing it will take because our > customers are writing the software that does it and I have no direct > way to contanct their developers.
It's a bit difficult to give an accurate answer with the above-mentioned "requirements specification". ;) It's going to depend on how much latency you can tolerate. If you were able to wait for a few KB to be accumulated on each side before transferring, you'd have absolutely no problem achieving your 27MB/s in each direction. Of course, that introduces large delays in your stream. OTOH if the application isn't tolerant to large latencies and, for example, you needed to do single 32-bit PIO transactions, then we've seen fetches from *memory* on the back-end of the PCI core take up to 20 PCI clocks to complete on the host (shave a few clocks off if your data is in a register or read-ahead FIFO, for example). That brings your throughput down to around 5MB/s - total! There's a lot of latency introduced when pushing data through the PCI core FIFOs in each direction. Obviously if you can stream large chunks that latency becomes insignificant w.r.t. throughput. PCI retries on posted reads also add to the equation. Nutshell - you need to work out *exactly* what latencies you can tolerate. Regards, -- Mark McDougall, Engineer Virtual Logic Pty Ltd, <http://www.vl.com.au> 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266
Reply by Brian McFarland July 21, 20062006-07-21
I just found out how much the liscense for the altera core costs.
Considering that production quantity will be relatively low, I would
like to avoid using it if this is possible / practicle.  Does the
opencore one do bus mastering well enough to acheive the kind of
tranfer rates I'm hoping for?


Mark McDougall wrote:

> What type of transfers are you looking at? Will it be PIO (single > byte/word/dword) transfers? Initiated by the PC? Or DMA, initiated by > the card? Is the data isochronous 27MB/s? Or can it be buffered and > transferred periodically in large chunks?
It's an I/O interface that will be constantly receiving and transmitting something at 270 Mbps both directions using 8B/10B encoding. Which means potentially, we could want the card to transmit and receive 27 MB/s. However, in this particular application, rate of the real data will be closer to just 2MB/s. If there's a latency due to buffering & block tranfers, it's probably not a concern as long I can have large enough FIFOs on the FPGA that they never become empty while I'm filling the PC side buffer. The whole reason for this interface is to modify the input data stream and send it back out and the delay caused by CPU time is probably going to be considerably more -- although i'm not sure how much processing it will take because our customers are writing the software that does it and I have no direct way to contanct their developers.
Reply by Mark McDougall July 19, 20062006-07-19
Brian McFarland wrote:

> I know that theoretically, the max tranfer > rate of the bus is 133 MB/s w/ 33MHz systems.
That's the raw rate based purely on the signaling. Once you add the PCI protocol on top of that, which includes arbitration phases etc, IIRC the maximum *theoretical* data transfer rate is more like 120MB/s. Having said that, I've worked on a design which included an Altera PCI core and a DMA bus master transferring large chunks. In a *desktop* PC running Win2K, the *sustained* throughput was around 100MB/s. FWIW IIRC the same hardware under Linux didn't get much over 80MB/s, but that's another story.
> Ideally, I would like to be able to guarantee 54 MB/s with pretty much > equal I/O rates (27MB/s into and out of the device). Most of the time, > rates should be lower than that, but just about the max I could ever > need it to be.
What type of transfers are you looking at? Will it be PIO (single byte/word/dword) transfers? Initiated by the PC? Or DMA, initiated by the card? Is the data isochronous 27MB/s? Or can it be buffered and transferred periodically in large chunks? Your answers to the above questions will determine the suitability or otherwise of any potential solution. As Eric pointed out elsewhere in this thread, typically host PCI chipsets won't burst more than a single cache line. And even that requires attention to how you configure your PCI memory space. Single host reads can be very inefficient, as often the target must disconnect whilst the data is fetched. In the meantime, the bus is free for other peripherals to grab. For example, with the opencores PCI core (for which *all* reads are posted), we're seeing reads disconnected *twice* before the third succeeds, albeit on a non-intel platform where the host can be quite slow to retry. In a nutshell, if you're bus-mastering DMA in reasonable chunks, then 54MB/s should be easily achievable. If not, then you need to characterise your transfer profile before I could comment any further. Regards, -- Mark McDougall, Engineer Virtual Logic Pty Ltd, <http://www.vl.com.au> 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266
Reply by Hal Murray July 19, 20062006-07-19
>This is my experience with anything PCI related, so I'm still not very >clear on whether I can get away with a target, or if I'll need to >mastering/DMA capabilities. I know that theoretically, the max tranfer >rate of the bus is 133 MB/s w/ 33MHz systems. The intended customer of >this thing is going for low cost, so I'm not going to assume that it >will be used with a computer that supports 66Mhz or 64-bit transfers. >Ideally, I would like to be able to guarantee 54 MB/s with pretty much >equal I/O rates (27MB/s into and out of the device).
The key idea is that most CPUs (or host bridges) only transfer one word (32 bits) per transaction when it is reading/writing to a PCI target. Look at the timing diagrams for simple target transfers. How many cycles do they take? 54 is 40% of 133, so you have to do a whole transaction in 2 cycles. It just isn't going to happen. My only one-word-per-transaction observation is several/many years old. Things might be better now. I wouldn't bet on it without seeing a nice picture on a scope. -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.
Reply by Antti Lukats July 19, 20062006-07-19
"bart" <bart.borosky@latticesemi.com> schrieb im Newsbeitrag 
news:1153331232.004936.238120@p79g2000cwp.googlegroups.com...
>I am no lawyer, but it my understanding that the Lattice reference > design is intended for use on Lattice devices. The license agreement > says something like: "for the sole purpose of programming Lattice > programmable logic devices." > > If you use the Lattice PCI reference design, it sounds like you should > use the LatticeECP2 device or another Lattice FPGA, a list of which you > can find here: > http://www.latticesemi.com/products/fpga/index.cfm > > > Hope this helps. > Bart Borosky, Lattice >
Hi Bart, well yes -- everyone should read the license of course. I would gladly use that lattice pci core on lattice boards, and hopefully one day I will - but as I dont have any lattice-pci fpga boards so I have evaluated the lattice-pci core on different PCI boards. [snip - self censoring, the deleted text goes to lattice in private] all the use of it I have ever done is fpga board initial testing - nothing more. I have never considered using it in any products based on non-Lattice silicon (because of the license) but - the fact that this core is available - it brings some attention to Lattice if someone mentions it. I should have mentioned the license clause of course - but the link I provided did land on the license agreement and not direct download. Antti