Forums

FPGA : PCI core needed

Started by Kevin Brace November 2, 2005
Hi bijoy,

My company Brace Design Solutions has developed a Xilinx (TM) LogiCORE
(TM) PCI compatible (replacement) PCI IP core called BDS XPCI PCI IP core.
Assuming that your project's PCI interface is target only (no
initiator), and uses only one BAR (Base Address Register), BDS XPCI32
PCI IP core should occupy roughly 580 LUTs and 250 FFs.
That should translate roughly 290 Slices (580 / 2 = 290).
Of course, if your project uses initiator mode, the LUT consumption will
be much higher.
         If the number we presented is not satisfactory, we have several
ideas to reducing the LUT count such as:

* Using multiplexer instead of internal tri-state buffers for
configuration register part of the PCI IP core

* Completely getting rid of initiator capability by removing initiator
related logic

* (I personally don't like it, but . . .) Getting rid of parity checking
capability


Obviously, the custom version will cost more money than the regular
version because we will have to customize it, but let us know if you are
interested.
For more information, visit Brace Design Solutions website at
http://www.bracedesignsolutions.com.


Kevin Brace


bijoy wrote:
> Hi My company wanted to buy PCI core(33Mhz), and it should be fitted
in spartan-3 fpga and should not take not more than 350 slices Does any one have got any idea from where i can get this PCI core ? pls mail to pbijoy@rediffmail.com
> > rgds bijoy
-- Brace Design Solutions Xilinx (TM) LogiCORE (TM) PCI compatible BDS XPCI PCI IP core available for as little as $100 for non-commercial, non-profit, personal use. http://www.bracedesignsolutions.com Xilinx and LogiCORE are registered trademarks of Xilinx, Inc.
Kevin Brace <sa0les1@brac2ed3esi4gns5olut6ions.com> writes:
> If the number we presented is not satisfactory, we have several > ideas to reducing the LUT count such as: > > * Using multiplexer instead of internal tri-state buffers for > configuration register part of the PCI IP core
Will that help? Don't the synthesis tools translate use of tri-state buffers into multiplexers on most of the newer Xilinx FPGAs anyhow, since the parts don't have actual tri-state buffers?
I'm interested on this point, too.  If the core is provided as source, the 
synthesis will probably handle the conversion well.  If the core is an .ngo 
file like the Xilinx alternative, the Xilinx mapper ends up making the 
supstitution and the synthesis tool (SynplifyPro in my case) is stymied 
because the black box for the core doesn't have the information to allow the 
tristates in the core to be converted so the total conversion falls apart.


"Eric Smith" <eric@brouhaha.com> wrote in message 
news:qhsluelmll.fsf@ruckus.brouhaha.com...
> Kevin Brace <sa0les1@brac2ed3esi4gns5olut6ions.com> writes: >> If the number we presented is not satisfactory, we have several >> ideas to reducing the LUT count such as: >> >> * Using multiplexer instead of internal tri-state buffers for >> configuration register part of the PCI IP core > > Will that help? Don't the synthesis tools translate use of tri-state > buffers into multiplexers on most of the newer Xilinx FPGAs anyhow, > since the parts don't have actual tri-state buffers?
Hi Eric,

Here is a comparison of BDS XPCI PCI IP core's configuration register 
block in MUX (It gets directly map to LUTs by XST.) and internal 
tri-state buffers.
The first result is the MUX version.

____________________________
Release 6.3.03i Map G.38
Xilinx Mapping Report File for Design 'pcim_top_BDS_XPCI32'

Design Information
------------------
Command Line   : C:/Xilinx_webpack_6_3/bin/nt/map.exe -intstyle ise -p
xc3s200-ft256-4 -cm area -pr b -k 4 -c 100 -tx off -o
pcim_top_BDS_XPCI32_map.ncd pcim_top_BDS_XPCI32.ngd pcim_top_BDS_XPCI32.pcf
Target Device  : x3s200
Target Package : ft256
Target Speed   : -4
Mapper Version : spartan3 -- $Revision: 1.16.8.2 $
Mapped Date    : Thu Nov 03 19:45:18 2005

Design Summary
--------------
Number of errors:      0
Number of warnings:   32
Logic Utilization:
   Number of Slice Flip Flops:         289 out of   3,840    7%
   Number of 4 input LUTs:             498 out of   3,840   12%
Logic Distribution:
   Number of occupied Slices:                          380 out of 
1,920   19%
     Number of Slices containing only related logic:     380 out of 
380  100%
     Number of Slices containing unrelated logic:          0 out of 
380    0%
       *See NOTES below for an explanation of the effects of unrelated logic
Total Number 4 input LUTs:            530 out of   3,840   13%
   Number used as logic:                498
   Number used as 16x1 RAMs:             32
   Number of bonded IOBs:               49 out of     173   28%
     IOB Flip Flops:                    91
   Number of GCLKs:                     1 out of       8   12%

Total equivalent gate count for design:  10,268
Additional JTAG gate count for IOBs:  2,352
Peak Memory Usage:  80 MB
____________________________



The second result is the internal tri-state buffer version which gets 
converted to LUTs by MAP.

____________________________
Release 6.3.03i Map G.38
Xilinx Mapping Report File for Design 'pcim_top_BDS_XPCI32'

Design Information
------------------
Command Line   : C:/Xilinx_webpack_6_3/bin/nt/map.exe -intstyle ise -p
xc3s200-ft256-4 -cm area -pr b -k 4 -c 100 -tx off -o
pcim_top_BDS_XPCI32_map.ncd pcim_top_BDS_XPCI32.ngd pcim_top_BDS_XPCI32.pcf
Target Device  : x3s200
Target Package : ft256
Target Speed   : -4
Mapper Version : spartan3 -- $Revision: 1.16.8.2 $
Mapped Date    : Thu Nov 03 19:54:28 2005

Design Summary
--------------
Number of errors:      0
Number of warnings:   32
Logic Utilization:
   Number of Slice Flip Flops:         283 out of   3,840    7%
   Number of 4 input LUTs:             567 out of   3,840   14%
Logic Distribution:
   Number of occupied Slices:                          422 out of 
1,920   21%
     Number of Slices containing only related logic:     422 out of 
422  100%
     Number of Slices containing unrelated logic:          0 out of 
422    0%
       *See NOTES below for an explanation of the effects of unrelated logic
Total Number 4 input LUTs:            600 out of   3,840   15%
   Number used as logic:                567
   Number used as a route-thru:           1
   Number used as 16x1 RAMs:             32
   Number of bonded IOBs:               49 out of     173   28%
     IOB Flip Flops:                    97
   Number of GCLKs:                     1 out of       8   12%

Total equivalent gate count for design:  11,258
Additional JTAG gate count for IOBs:  2,352
Peak Memory Usage:  80 MB
____________________________



The backend design is a simple 16 byte long I/O mapped memory 
synthesized by XST.
Subtracting the backend logic usage from the total logic usage will give 
you the BDS XPCI PCI IP core's logic usage.

____________________________
=========================================================================
*                            Final Report                               *
=========================================================================

Device utilization summary:
---------------------------

Selected Device : 3s200ft256-4

  Number of Slices:                      43  out of   1920     2%
  Number of Slice Flip Flops:            39  out of   3840     1%
  Number of 4 input LUTs:                44  out of   3840     1%
  Number of bonded IOBs:                 50  out of    173    28%
  Number of TBUFs:                       32  out of    960     3%
  Number of GCLKs:                        1  out of      8    12%
____________________________


Neither versions used a constraint file (UCF), so the FF count might be 
somewhat different if a constraint file was used, but that shouldn't 
affect the LUT count too much.
Both versions (MUX version and internal tri-state buffer version) of 
netlist of the BDS XPCI PCI IP core were synthesized by ISE 4.2i's XST 
for Spartan-II (ISE 4.2i's XST was used because that the last version of 
XST that can generate an EDIF netlist using a secret "-ofmt EDIF" switch.)
I am myself surprised that the use of MUX inside the BDS XPCI PCI IP 
core reduced the LUT this much, but what is interesting is that trying 
to emulate internal tri-state buffer with LUTs increases the LUT usage 
quite a bit.
         One more thing to note.
ISE WebPACK 6.3i was used for this test instead of 7.1i.
For some reason, Xilinx messed up the internal tri-state buffer 
conversion algorithm in 7.1i (The problem still lingers even in SP4.) 
that the above design won't map at all in 7.1i.
Answer record #20048 discusses this issue, but is not very helpful.


Kevin Brace


Eric Smith wrote:
> Kevin Brace <sa0les1@brac2ed3esi4gns5olut6ions.com> writes: > >> If the number we presented is not satisfactory, we have several >>ideas to reducing the LUT count such as: >> >>* Using multiplexer instead of internal tri-state buffers for >>configuration register part of the PCI IP core > > > Will that help? Don't the synthesis tools translate use of tri-state > buffers into multiplexers on most of the newer Xilinx FPGAs anyhow, > since the parts don't have actual tri-state buffers?
-- Brace Design Solutions Xilinx (TM) LogiCORE (TM) PCI compatible BDS XPCI PCI IP core available for as little as $100 for non-commercial, non-profit, personal use. http://www.bracedesignsolutions.com Xilinx and LogiCORE are registered trademarks of Xilinx, Inc.
Kevin Brace wrote:

> I am myself surprised that the use of MUX inside the BDS XPCI PCI IP > core reduced the LUT this much, but what is interesting is that trying > to emulate internal tri-state buffer with LUTs increases the LUT usage > quite a bit.
Good to see some actual evidence on this often debated topic. Thanks for the posting. -- Mike Treseler