> I am myself surprised that the use of MUX inside the BDS XPCI PCI IP
> core reduced the LUT this much, but what is interesting is that trying
> to emulate internal tri-state buffer with LUTs increases the LUT usage
> quite a bit.
Good to see some actual evidence
on this often debated topic.
Thanks for the posting.
-- Mike Treseler
Reply by Kevin Brace●November 3, 20052005-11-03
Hi Eric,
Here is a comparison of BDS XPCI PCI IP core's configuration register
block in MUX (It gets directly map to LUTs by XST.) and internal
tri-state buffers.
The first result is the MUX version.
____________________________
Release 6.3.03i Map G.38
Xilinx Mapping Report File for Design 'pcim_top_BDS_XPCI32'
Design Information
------------------
Command Line : C:/Xilinx_webpack_6_3/bin/nt/map.exe -intstyle ise -p
xc3s200-ft256-4 -cm area -pr b -k 4 -c 100 -tx off -o
pcim_top_BDS_XPCI32_map.ncd pcim_top_BDS_XPCI32.ngd pcim_top_BDS_XPCI32.pcf
Target Device : x3s200
Target Package : ft256
Target Speed : -4
Mapper Version : spartan3 -- $Revision: 1.16.8.2 $
Mapped Date : Thu Nov 03 19:45:18 2005
Design Summary
--------------
Number of errors: 0
Number of warnings: 32
Logic Utilization:
Number of Slice Flip Flops: 289 out of 3,840 7%
Number of 4 input LUTs: 498 out of 3,840 12%
Logic Distribution:
Number of occupied Slices: 380 out of
1,920 19%
Number of Slices containing only related logic: 380 out of
380 100%
Number of Slices containing unrelated logic: 0 out of
380 0%
*See NOTES below for an explanation of the effects of unrelated logic
Total Number 4 input LUTs: 530 out of 3,840 13%
Number used as logic: 498
Number used as 16x1 RAMs: 32
Number of bonded IOBs: 49 out of 173 28%
IOB Flip Flops: 91
Number of GCLKs: 1 out of 8 12%
Total equivalent gate count for design: 10,268
Additional JTAG gate count for IOBs: 2,352
Peak Memory Usage: 80 MB
____________________________
The second result is the internal tri-state buffer version which gets
converted to LUTs by MAP.
____________________________
Release 6.3.03i Map G.38
Xilinx Mapping Report File for Design 'pcim_top_BDS_XPCI32'
Design Information
------------------
Command Line : C:/Xilinx_webpack_6_3/bin/nt/map.exe -intstyle ise -p
xc3s200-ft256-4 -cm area -pr b -k 4 -c 100 -tx off -o
pcim_top_BDS_XPCI32_map.ncd pcim_top_BDS_XPCI32.ngd pcim_top_BDS_XPCI32.pcf
Target Device : x3s200
Target Package : ft256
Target Speed : -4
Mapper Version : spartan3 -- $Revision: 1.16.8.2 $
Mapped Date : Thu Nov 03 19:54:28 2005
Design Summary
--------------
Number of errors: 0
Number of warnings: 32
Logic Utilization:
Number of Slice Flip Flops: 283 out of 3,840 7%
Number of 4 input LUTs: 567 out of 3,840 14%
Logic Distribution:
Number of occupied Slices: 422 out of
1,920 21%
Number of Slices containing only related logic: 422 out of
422 100%
Number of Slices containing unrelated logic: 0 out of
422 0%
*See NOTES below for an explanation of the effects of unrelated logic
Total Number 4 input LUTs: 600 out of 3,840 15%
Number used as logic: 567
Number used as a route-thru: 1
Number used as 16x1 RAMs: 32
Number of bonded IOBs: 49 out of 173 28%
IOB Flip Flops: 97
Number of GCLKs: 1 out of 8 12%
Total equivalent gate count for design: 11,258
Additional JTAG gate count for IOBs: 2,352
Peak Memory Usage: 80 MB
____________________________
The backend design is a simple 16 byte long I/O mapped memory
synthesized by XST.
Subtracting the backend logic usage from the total logic usage will give
you the BDS XPCI PCI IP core's logic usage.
____________________________
=========================================================================
* Final Report *
=========================================================================
Device utilization summary:
---------------------------
Selected Device : 3s200ft256-4
Number of Slices: 43 out of 1920 2%
Number of Slice Flip Flops: 39 out of 3840 1%
Number of 4 input LUTs: 44 out of 3840 1%
Number of bonded IOBs: 50 out of 173 28%
Number of TBUFs: 32 out of 960 3%
Number of GCLKs: 1 out of 8 12%
____________________________
Neither versions used a constraint file (UCF), so the FF count might be
somewhat different if a constraint file was used, but that shouldn't
affect the LUT count too much.
Both versions (MUX version and internal tri-state buffer version) of
netlist of the BDS XPCI PCI IP core were synthesized by ISE 4.2i's XST
for Spartan-II (ISE 4.2i's XST was used because that the last version of
XST that can generate an EDIF netlist using a secret "-ofmt EDIF" switch.)
I am myself surprised that the use of MUX inside the BDS XPCI PCI IP
core reduced the LUT this much, but what is interesting is that trying
to emulate internal tri-state buffer with LUTs increases the LUT usage
quite a bit.
One more thing to note.
ISE WebPACK 6.3i was used for this test instead of 7.1i.
For some reason, Xilinx messed up the internal tri-state buffer
conversion algorithm in 7.1i (The problem still lingers even in SP4.)
that the above design won't map at all in 7.1i.
Answer record #20048 discusses this issue, but is not very helpful.
Kevin Brace
Eric Smith wrote:
> Kevin Brace <sa0les1@brac2ed3esi4gns5olut6ions.com> writes:
>
>> If the number we presented is not satisfactory, we have several
>>ideas to reducing the LUT count such as:
>>
>>* Using multiplexer instead of internal tri-state buffers for
>>configuration register part of the PCI IP core
>
>
> Will that help? Don't the synthesis tools translate use of tri-state
> buffers into multiplexers on most of the newer Xilinx FPGAs anyhow,
> since the parts don't have actual tri-state buffers?
--
Brace Design Solutions
Xilinx (TM) LogiCORE (TM) PCI compatible BDS XPCI PCI IP core available
for as little as $100 for non-commercial, non-profit, personal use.
http://www.bracedesignsolutions.com
Xilinx and LogiCORE are registered trademarks of Xilinx, Inc.
Reply by John_H●November 2, 20052005-11-02
I'm interested on this point, too. If the core is provided as source, the
synthesis will probably handle the conversion well. If the core is an .ngo
file like the Xilinx alternative, the Xilinx mapper ends up making the
supstitution and the synthesis tool (SynplifyPro in my case) is stymied
because the black box for the core doesn't have the information to allow the
tristates in the core to be converted so the total conversion falls apart.
"Eric Smith" <eric@brouhaha.com> wrote in message
news:qhsluelmll.fsf@ruckus.brouhaha.com...
> Kevin Brace <sa0les1@brac2ed3esi4gns5olut6ions.com> writes:
>> If the number we presented is not satisfactory, we have several
>> ideas to reducing the LUT count such as:
>>
>> * Using multiplexer instead of internal tri-state buffers for
>> configuration register part of the PCI IP core
>
> Will that help? Don't the synthesis tools translate use of tri-state
> buffers into multiplexers on most of the newer Xilinx FPGAs anyhow,
> since the parts don't have actual tri-state buffers?
Reply by ●November 2, 20052005-11-02
Kevin Brace <sa0les1@brac2ed3esi4gns5olut6ions.com> writes:
> If the number we presented is not satisfactory, we have several
> ideas to reducing the LUT count such as:
>
> * Using multiplexer instead of internal tri-state buffers for
> configuration register part of the PCI IP core
Will that help? Don't the synthesis tools translate use of tri-state
buffers into multiplexers on most of the newer Xilinx FPGAs anyhow,
since the parts don't have actual tri-state buffers?
Reply by Kevin Brace●November 2, 20052005-11-02
Hi bijoy,
My company Brace Design Solutions has developed a Xilinx (TM) LogiCORE
(TM) PCI compatible (replacement) PCI IP core called BDS XPCI PCI IP core.
Assuming that your project's PCI interface is target only (no
initiator), and uses only one BAR (Base Address Register), BDS XPCI32
PCI IP core should occupy roughly 580 LUTs and 250 FFs.
That should translate roughly 290 Slices (580 / 2 = 290).
Of course, if your project uses initiator mode, the LUT consumption will
be much higher.
If the number we presented is not satisfactory, we have several
ideas to reducing the LUT count such as:
* Using multiplexer instead of internal tri-state buffers for
configuration register part of the PCI IP core
* Completely getting rid of initiator capability by removing initiator
related logic
* (I personally don't like it, but . . .) Getting rid of parity checking
capability
Obviously, the custom version will cost more money than the regular
version because we will have to customize it, but let us know if you are
interested.
For more information, visit Brace Design Solutions website at
http://www.bracedesignsolutions.com.
Kevin Brace
bijoy wrote:
> Hi My company wanted to buy PCI core(33Mhz), and it should be fitted
in spartan-3 fpga and should not take not more than 350 slices Does any
one have got any idea from where i can get this PCI core ? pls mail to
pbijoy@rediffmail.com
>
> rgds bijoy
--
Brace Design Solutions
Xilinx (TM) LogiCORE (TM) PCI compatible BDS XPCI PCI IP core available
for as little as $100 for non-commercial, non-profit, personal use.
http://www.bracedesignsolutions.com
Xilinx and LogiCORE are registered trademarks of Xilinx, Inc.