FPGARelated.com
Forums

Xilinx MacFir5.0 - Block Ram requirenments

Started by Nemesis June 20, 2005
Hi all,
I implemented a fir filter using the MacFir5.0 core from Xilinx.
The filter 2 coefficient sets of 33 taps, with 14 bit
coefficients and data.

I was looking at the resouces utilization and I found a strange data,
the filter requires 33 multipliers and 33 block ram.

I can't understand why it requires so much block rams! Each block ram
should store 1k x 18bits!
If I set the core to use the distributed ram for coefficients or
for data this number doesn't change, of course it goes to zero when
I set the core to use the distributed ram for both coefficients and
data.

I read on the User Guide that multipliers and block ram shares routing
resources,
is that the cause of the great number of bram used?

Nemesis wrote:
> Hi all, > I implemented a fir filter using the MacFir5.0 core from Xilinx. > The filter 2 coefficient sets of 33 taps, with 14 bit > coefficients and data. > > I was looking at the resouces utilization and I found a strange data, > the filter requires 33 multipliers and 33 block ram. > > I can't understand why it requires so much block rams! Each block ram > should store 1k x 18bits! > If I set the core to use the distributed ram for coefficients or > for data this number doesn't change, of course it goes to zero when > I set the core to use the distributed ram for both coefficients and > data. > > I read on the User Guide that multipliers and block ram shares routing > resources, > is that the cause of the great number of bram used? >
AFAIK, the core will pack together a MULT18x18 and a BRAM with a connection between the two to maximize speed. The coefficients end up getting replicated in each BRAM. If you want to reduce the area, check to see if your input data rate is low compared to the clock rate. If so, you can tell the core gen about this and it will re-use the same multiplier for multiple taps. If you can't do this, consider using DA filters. -Jim
Jim George wrote:

[MAC_FIR and BRAM]
> AFAIK, the core will pack together a MULT18x18 and a BRAM with a > connection between the two to maximize speed. The coefficients end up > getting replicated in each BRAM.
Ah! OK, so this is the cause.
> If you want to reduce the area, check > to see if your input data rate is low compared to the clock rate. If so, > you can tell the core gen about this and it will re-use the same > multiplier for multiple taps.
This is not my situatuon. I have a data rate of 64MHz and I need to decimateit to 32MHz, when I syntesize the core with sample_rate=clock=64MHz, I get a maximum clock frequency of 180MHz (V2pro50-6), but if I synthesize the core with sample_rate=64MHz and clock=128MHz then XST reports the maximum clock to be 126MHz.
> If you can't do this, consider using DA filters.
I think I'll try them, now I just checked "use distributed ram" for both coefficients and data, and spending some extra slices I got 0 BRAM used. But I have one last question, are these BRAM really free to use?
BRAMs are "free" if you have no other use for them.
The chip has a cetain number of BRAMs, and if you do not use them, they
just sit there and do nothing for you. So, they are free until you need
more than you have, then they are precious.
Peter Alfke, Xilinx

Peter Alfke wrote:
> BRAMs are "free" if you have no other use for them.
I have lots of other uses for them :-)
> The chip has a cetain number of BRAMs, and if you do not use them, they > just sit there and do nothing for you. So, they are free until you need > more than you have, then they are precious.
I need them in other applications that will use the same FPGA, this filter is only a little part of the whole project, so I'm investigating the causes of the large amount of BRAM used by the MAC_FIR. Maybe my question was not so clear, I just wanted to know if these BRAMS that shares routing resources with the Multipliers will be available for other cores that need them (like the FFT i.e.).
On 21 Jun 2005 09:40:16 -0700, "Nemesis" <nemesis2001@gmx.it> wrote:
>Maybe my question was not so clear, I just wanted to know if these >BRAMS that shares routing resources with the Multipliers will be >available for other cores that need them (like the FFT i.e.).
The BRAMs are still available when using the co-located multiplier, except for BRAMs in the widest data path mode. I.E. you can use the co-located BRAM in x1, x2, x4, x9 mode, but not x18 . This is because the BRAM and MPY share the connection resources to the rest of the fabric. In x18 mode the BRAM uses everything. In the narrower modes, there is enough connection resources remaining to fully support the MPY. Philip Philip Freidin Fliptronics

Philip Freidin wrote:
> On 21 Jun 2005 09:40:16 -0700, "Nemesis" <nemesis2001@gmx.it> wrote: > >Maybe my question was not so clear, I just wanted to know if these > >BRAMS that shares routing resources with the Multipliers will be > >available for other cores that need them (like the FFT i.e.). > > The BRAMs are still available when using the co-located multiplier, > except for BRAMs in the widest data path mode. I.E. you can use the > co-located BRAM in x1, x2, x4, x9 mode, but not x18 . >
I'm pretty sure the widest mode is x36 and x18 is O.K. with co-located multiplier...
On 22 Jun 2005 14:40:21 -0700, "Gabor" <gabor@alacron.com> wrote:
>Philip Freidin wrote: >> On 21 Jun 2005 09:40:16 -0700, "Nemesis" <nemesis2001@gmx.it> wrote: >> >Maybe my question was not so clear, I just wanted to know if these >> >BRAMS that shares routing resources with the Multipliers will be >> >available for other cores that need them (like the FFT i.e.). >> >> The BRAMs are still available when using the co-located multiplier, >> except for BRAMs in the widest data path mode. I.E. you can use the >> co-located BRAM in x1, x2, x4, x9 mode, but not x18 . >> > >I'm pretty sure the widest mode is x36 and x18 is O.K. with >co-located multiplier...
Yes, You are rignt. X18 works fine too, it is x36 that uses up all the fabric interface. Philip Philip Freidin Fliptronics
Philip Freidin wrote:

>Yes, You are rignt. X18 works fine too, it is x36 that uses up all the >fabric interface. > >Philip > > >Philip Freidin >Fliptronics > >
And that problem is avoided in Virtex 4. FWIW, the FIR filter core generator does fine with generic stuff, but when you have special considerations it often doesn't give you the best solution. Unfortunately, you can't get under the hood to tweak it at all, so your choices are to live with the core generator as is, or create your own filter from scratch (not hard to do, but time consuming by the time you get through verification). -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759
Philip Freidin wrote:
> >> The BRAMs are still available when using the co-located multiplier, > >> except for BRAMs in the widest data path mode. I.E. you can use the > >> co-located BRAM in x1, x2, x4, x9 mode, but not x18 . > >I'm pretty sure the widest mode is x36 and x18 is O.K. with > >co-located multiplier... > Yes, You are rignt. X18 works fine too, it is x36 that uses up all the > fabric interface.
The problem is that I don't know exactly how the FFT core uses these BRAMs. I need to implement an FFT transform with 24bits data and phase factors. Should I assume it implements x36 BRAMs? If that's true I can't use these co-located BRAMs! ... but maybe the FFT core uses BRAMs co-located with it's own multipliers. I'm just a little bit confused :-/