FPGARelated.com
Forums

Making a 32KB BRAM block, virtex-4

Started by Bhanu Chandra February 25, 2007
Hi all,

I am working on making a second level cache for microbalze processor
on a ML403 board. This comes in the virtex-4 family. I am looking at
32KB space for cache data. The tag stays in a separate block. The
problem that I am having here is that each BRAM primitive is of 2KB,
hence i am in an uncomfortable situation in which I would have to use
16 different variables.

Is there a way in which I can combine the 16 primitives and get a 32Kb
block ram? If so, please specify some details and some links which
have information regarding the same.

Thanks in advance,

Bhanu

"Bhanu Chandra" <vbhanu@gmail.com> wrote in message 
news:1172433285.638808.127220@8g2000cwh.googlegroups.com...
> Is there a way in which I can combine the 16 primitives and get a 32Kb > block ram?
Yes, write some code (VHDL or Verilog) that instantiates 16 BRAMs and defines how you want them to be connected.
> If so, please specify some details and some links which > have information regarding the same.
Decode the upper 4 address bits into your 32K address space to use as like a 'chip select' which you would then use to select which one of the 16 BRAMs you want to write to. Those same 4 address bits would also be used as the 'select' input to a 16->1 mux which would be used to select which BRAM data output is the 'read data' output of your 32K memory. Kevin Jennings
> Yes, write some code (VHDL or Verilog) that instantiates 16 BRAMs and > defines how you want them to be connected. > >> If so, please specify some details and some links which >> have information regarding the same. > > Decode the upper 4 address bits into your 32K address space to use as like > a 'chip select' which you would then use to select which one of the 16 > BRAMs you want to write to. > > Those same 4 address bits would also be used as the 'select' input to a > 16->1 mux which would be used to select which BRAM data output is the > 'read data' output of your 32K memory. >
or for better timing split the block rams bit wise. For example, you need 16 block rams, so if you want a 32 bit wide memory use the RAMB16_S2_S2 primitive (for dual port) and assign 2 bits of your databus to each memory. The addresses are common. /MikeJ www.fpgaarcade.com
KJ wrote:

> "Bhanu Chandra" <vbhanu@gmail.com> wrote in message > news:1172433285.638808.127220@8g2000cwh.googlegroups.com... > >>Is there a way in which I can combine the 16 primitives and get a 32Kb >>block ram? >
> > Decode the upper 4 address bits into your 32K address space to use as like a > 'chip select' which you would then use to select which one of the 16 BRAMs > you want to write to. > > Those same 4 address bits would also be used as the 'select' input to a > 16->1 mux which would be used to select which BRAM data output is the 'read > data' output of your 32K memory. > > Kevin Jennings > >
It would be better to set the BRAMs up as 16Kx1, using as many as you need for bits and then a simple 2:1 mux to select between two banks for the 32K size. This eliminates a lot of the external logic by using more of the internal decode. As a result you get considerably better timing and power dissipation. Also, it turns out it is much easier to route because each BRAM has only one read data and one write data rather than the full width of the BRAM.
"Ray Andraka" <ray@andraka.com> wrote in message 
news:8CrEh.5899$qr5.1565@newsfe19.lga...
> > It would be better to set the BRAMs up as 16Kx1...
Mike, Ray, Absolutely right. I must've had my 'slow performance' mode hat on for some reason. Kevin
Ray Andraka wrote:
> It would be better to set the BRAMs up as 16Kx1, using as many as you > need for bits and then a simple 2:1 mux to select between two banks for > the 32K size. This eliminates a lot of the external logic by using more > of the internal decode. As a result you get considerably better timing > and power dissipation. Also, it turns out it is much easier to route > because each BRAM has only one read data and one write data rather than > the full width of the BRAM.
Just a note for the archive: I agree with Ray, but if power consumption is a consideration, you need to experiment with both implementations. Some architectures have power burn strongly affected by the number of enabled BRAMs.
Tim wrote:
> Ray Andraka wrote: > >> It would be better to set the BRAMs up as 16Kx1, using as many as you >> need for bits and then a simple 2:1 mux to select between two banks >> for the 32K size. This eliminates a lot of the external logic by >> using more of the internal decode. As a result you get considerably >> better timing and power dissipation. Also, it turns out it is much >> easier to route because each BRAM has only one read data and one write >> data rather than the full width of the BRAM. > > > Just a note for the archive: I agree with Ray, but if power consumption > is a consideration, you need to experiment with both implementations. > Some architectures have power burn strongly affected by the number of > enabled BRAMs.
Either way you have the same number of BRAMs unless your data bus is the right width to take advantage of the parity bits in the wider configuration to reduce the BRAM count. I don't recall what the OP's width was, but I was thinking it was 16 bits, in which case the parity bits aren't used. In any event, the extra logic needed to mux 32 banks of 16/19 bit wide BRAMs rather than the 2:1 mux needed to select from pairs of 16Kx1 banks is going to consume far more power than an additional BRAM. I didn't mention it in my original post, but the logic resources used are less for the 16Kx1 implementation as well. Generally speaking, you want to use the deepest aspect ratio that fits with your design. The exceptions come in for special cases where the number of BRAM available is limited and using the parity bits will reduce the BRAM count.
Ray Andraka wrote:

> In any event, the extra logic needed to mux 32 banks of 16/19 bit wide > BRAMs rather than the 2:1 mux needed to select from pairs of 16Kx1 banks > is going to consume far more power than an additional BRAM.
What you say may be true in almost all cases - I haven't done the comparison across the many and various FPGAs. But it certainly isn't true for at least one Xilinx family - if power consumption is an issue it's worth making the checks and knowing for sure.
Tim wrote:

> Ray Andraka wrote: > >> In any event, the extra logic needed to mux 32 banks of 16/19 bit wide >> BRAMs rather than the 2:1 mux needed to select from pairs of 16Kx1 >> banks is going to consume far more power than an additional BRAM. > > > What you say may be true in almost all cases - I haven't done the > comparison across the many and various FPGAs. But it certainly isn't > true for at least one Xilinx family - if power consumption is an issue > it's worth making the checks and knowing for sure.
Tim, I am failing to see it. If you are building a 16x32K memory, for example, you could do it with 32 16Kx1 plus 16 2:1 muxes, or you could do it with 32 1Kx18 BRAMs plus 16 16:1 muxes, which occupies about 3x the number of LUTs. Same number of BRAMs, more logic. If have a width where you use the parity bits, then yes there is a difference in the BRAM count, for example an 18x32K you use 36 16Kx1 or 32 1kx18. In that case, then yes you use 4 more BRAMs, to save a relatively small number of LUTs, and the power consumption is probably less. In the general case though, the answer depends on the candidate BRAM organizations, and this is only true because the extra memory density is only available in the x9, x18, and x36 configurations.
"Ray Andraka" <ray@andraka.com> wrote in message 
news:aB3Fh.219671$IL1.61458@newsfe13.lga...
> > Tim, I am failing to see it. > If you are building a 16x32K memory, for example, you could do it with 32 > 16Kx1 plus 16 2:1 muxes, or you could do it with 32 1Kx18 BRAMs plus 16 > 16:1 muxes, which occupies about 3x the number of LUTs. Same number of > BRAMs, more logic. > > If have a width where you use the parity bits, then yes there is a > difference in the BRAM count, for example an 18x32K you use 36 16Kx1 or 32 > 1kx18. In that case, then yes you use 4 more BRAMs, to save a relatively > small number of LUTs, and the power consumption is probably less. > > In the general case though, the answer depends on the candidate BRAM > organizations, and this is only true because the extra memory density is > only available in the x9, x18, and x36 configurations.
The point I saw in his earlier post: if you have 32 16kx1 memories, you're enabling at least 16 memories. If you use 32 1kx16 memories, only one needs to be enabled. If the power for 15 enabled RAM access cycles (versus disabled cycles) is significantly greater than the power for the multiplexer logic and increased routing burden, the power question isn't a gimme. If all 32 memories are always enabled in both schemes, the point is moot; the 16kx1s will win out. If only the decoded memory is enabled, the difference might be large. Or it might not. - John_H