FPGARelated.com
Forums

Altera's altsyncram MAXIMUM_DEPTH

Started by Peter Sommerfeld November 17, 2003
What does this generic means?

I am wondering if I am missing out on a possible memory optimization.

Altera's docs are decidedly vague and a search on their website brings up nothing.

-- Pete
Hi Peter,

> I am wondering if I am missing out on a possible memory optimization.
yes you do. Quartus allocates memory by depth first, 512x8bit therefore uses two M4Ks in 512x4 mode. If your memory width and depth is a power of two, allocation order doesn't matter except for some speed details. But a 700x8bit memory is much better allocated by width than by depth (because only 3 M4Ks are needed for the first compared to 4 for the latter). (see http://www.altera.com/support/kdb/rd03292002_9305.html for further details) MAXIMUM_DEPTH should help you to force Quartus not to waste this addtional memory block. Unfortunately it doesn't work. Not even the way Altera thinks it should work. I had a long (and somewhat bizarre) service request the last entry being the following one: -- Altera wrote This is to let you know that a software problem request has been filed in order to reflect this issue. I will let you know as soon the software group gets back to me with any infomation or when a resolution is made. -- Altera wrote much more, but [snip] This was written the 25th of august and the service request was closed without further comment. I have posted an additional request asking for the actual state of the problem request about one month ago and did not receive any answer. Either Altera doesn't care or they don't want to state that this is an issue at present before they are able to ship the new Quartus 4.0 (hopefully fixing this and a lot of other things) - who knows? If anyone in the group thinks he can help on this topic or has further details I would be thankful to hear about it as Quartus wastes a lot of my memory and this has to change! I have to say that life with Altera mySupport is very ambiguous to me. Answers are generally quick and friendly (which is already a lot) but generally only helpful when problems are simple. Whenever the problem gets more complex or there is a bug thinks get very slow (or even stop). Regards, Manfred BTW: "Release notes for Service Pack 2 will be released on Friday, October 24, 2003." (seen on https://www.altera.com/support/software/download/service_packs/quartus/dnl- qii30sp2.jsp the 17th november) ======= Service Request Detail (reordered for your convenience) Request #: 10363308 Status: Closed Date Opened (PDT): 8/19/03 9:03 AM Date Closed (PDT): 9/4/03 6:52 PM Inquiry Type: Product Question Device Family: CYCLONE Device: Title: FIFO implementation size Description: I have created a 1300word by 8bit FIFO (sfifo). The implementation of this needs 16384 memory bits. Why? The FIFO-size should result in about 1300x8=10400 memory bits. As the blocksize of the embedded ram in Cyclone is 4096bits which can be organized 512x8 I expect Quartus to use three M4K's resulting in 4096*3=12288bits. Obviously it uses a fourth block, why? Regards, Manfred ------ 8/19/03 3:17 PM To Customer Hello Manfred, This is to let you know that I am currently looking into this. I will let you know as soon as I am able to verify the problem as you have described and come into a resolution. ------ 8/19/03 4:20 PM To Customer Hello Manfred, Since 1300 is larger than 1k, it'll use 2kx2 mode for best performance. To get the x8 mode you'll need 4 M4Ks. Click custom on (page 6 out of 8 of the megawizard), then you get an option to set Maximum depth option and if you set 512 then it'll use that mode and should only need 3 M4Ks. For more information on this, you may refer to the following link: http://www.altera.com/support/kdb/rd03292002_9305.html ------ 8/20/03 12:36 AM From Customer Hello Marlon, thanks for your quick and helpful reply. Now the behaviour of Quartus is clear to me. Unfortunately setting the parameter max. block depth to 512 in the Megawizard Plug-In Manager as you proposed does not result in a smaller memory consumption. I have attached the packed project for your convenience. Setting this parameter adds the following line in the scfifo instantiation code: maximum_depth => 512, however this parameter is not described in the Quartus II help page for the scfifo-Megafunction. Why? Regards, Manfred ------ 8/20/03 9:47 AM To Customer Hello Manfred, The MAXIMUM_DEPTH parameter is an internal parameter so there won't be any information on this in the Quartus II Help or Megawizard. ------ 8/20/03 11:26 PM From Customer Hello Marlon, again: Unfortunately setting the parameter max. block depth to 512 in the Megawizard Plug-In Manager as you proposed does NOT result in a smaller memory consumption. Why? Please check with the attached project file. Regards, Manfred ------ 8/21/03 5:08 PM To Customer Hello Manfred, Sorry for the inconvenience, but actually, in order to get the x8 mode you'll need 4 M4Ks. ------ 8/21/03 11:49 PM From Customer Hello Marlon, could you please specify why it is not possible to implement a 1300x8 FIFO in 3 M4K Blocks as this information is the opposite of both your first advice and the mentioned support database page (http://www.altera.com/support/kdb/rd03292002_9305.html). What exactly is the parameter maximal block depth for then? Regards, Manfred ------ 8/25/03 6:50 PM To Customer Hello Manfred, This is to let you know that a software problem request has been filed in order to reflect this issue. I will let you know as soon the software group gets back to me with any infomation or when a resolution is made.
petersommerfeld@hotmail.com (Peter Sommerfeld) wrote in message news:<5c4d983.0311170541.5bd0c1db@posting.google.com>...
> What does this generic means? > > I am wondering if I am missing out on a possible memory optimization. > > Altera's docs are decidedly vague and a search on their website brings up nothing. > > -- Pete
MAXIMUM_DEPTH controls the underlying RAM block size that will be used to construct the user's altsyncram memory. By default, the altsyncram megafunction will round up the memory depth to the next power-of-2, and use that as a RAM block size. For example, if you ask for a 3K-word memory, altsyncram will normally construct it from 4K RAM blocks, because this gives the best performance. If you are running short of RAM blocks, you could specify MAXIMUM_DEPTH=1024 for this example, and the altsyncram megafunction will construct the 3K memory from 1K-word RAM blocks, which might potentially use 1/4 fewer RAM blocks. The penalty for doing this is that the 3K-word memory constructed from 1K-word RAM blocks will need LEs to mux and de-mux the data, and will also run slower as a result. In summary, MAXIMUM_DEPTH is a control to increase memory efficiency for non-power-of-2 memory depths, but at a cost of lower memory performance, and a few LEs to stitch the smaller RAM blocks together. MAXIMUM_DEPTH can only take power-of-2 values, with 32 being the smallest meaningful value, since it corresponds to the shallowest M512 memory block configuration. - Subroto Datta Altera Corp.
Hi Manfred, Subroto:

Thank you very much for your in-depth replies. I'm happy to see that
MAXIMUM_DEPTH does what I was hoping it does, because I need many RAMs
at non-power-of-2 bits storage, and I'm feeling a little too lazy to
write my own muxing logic.

Manfred, I compiled a design that had one depth-first and one
width-first RAM block, each being 1,089 x 32 bits. The depth-first
used 16 M4k's as 4096x2, and the width-first used 9 M4k's as 128x32,
so the functionality appears to be working for me. Perhaps certain
memory configuration work properly with MAXIMUM_DEPTH, while others
(ie. yours) do not?

As expected the critical path was in the width-first logic, but was
still 220 MHz+.

I am using Quartus II 3.0 SP2. I found the release notes at
http://www.altera.com/literature/rn/rn_qts.pdf.

Thanks again,

-- Pete

Manfred M&#4294967295;cke <manfred.getmuecke@ridgmxof.thisat> wrote in message news:<oprysn2vdygdoir8@news.inode.at>...
> Hi Peter, > > > I am wondering if I am missing out on a possible memory optimization. > yes you do. > > Quartus allocates memory by depth first, 512x8bit therefore uses two M4Ks > in 512x4 mode. If your memory width and depth is a power of two, allocation > order doesn't matter except for some speed details. But a 700x8bit memory > is much better allocated by width than by depth (because only 3 M4Ks are > needed for the first compared to 4 for the latter). (see > http://www.altera.com/support/kdb/rd03292002_9305.html for further details) > MAXIMUM_DEPTH should help you to force Quartus not to waste this addtional > memory block. > > Unfortunately it doesn't work. Not even the way Altera thinks it should > work. I had a long (and somewhat bizarre) service request the last entry > being the following one: > > -- Altera wrote > This is to let you know that a software problem request has been filed in > order to reflect this issue. I will let you know as soon the software > group gets back to me with any infomation or when a resolution is made. > -- Altera wrote much more, but [snip] > > This was written the 25th of august and the service request was closed > without further comment. I have posted an additional request asking for the > actual state of the problem request about one month ago and did not receive > any answer. Either Altera doesn't care or they don't want to state that > this is an issue at present before they are able to ship the new Quartus > 4.0 (hopefully fixing this and a lot of other things) - who knows? > If anyone in the group thinks he can help on this topic or has further > details I would be thankful to hear about it as Quartus wastes a lot of my > memory and this has to change! > > I have to say that life with Altera mySupport is very ambiguous to me. > Answers are generally quick and friendly (which is already a lot) but > generally only helpful when problems are simple. Whenever the problem gets > more complex or there is a bug thinks get very slow (or even stop). > > Regards, Manfred > > BTW: "Release notes for Service Pack 2 will be released on Friday, October > 24, 2003." (seen on > https://www.altera.com/support/software/download/service_packs/quartus/dnl- > qii30sp2.jsp the 17th november) > > > > > ======= Service Request Detail (reordered for your convenience) > Request #: 10363308 Status: Closed Date Opened (PDT): 8/19/03 9:03 AM > Date Closed (PDT): 9/4/03 6:52 PM Inquiry Type: Product Question > > Device Family: CYCLONE Device: > Title: FIFO implementation size > > Description: I have created a 1300word by 8bit FIFO (sfifo). The > implementation of this needs 16384 memory bits. Why? > > The FIFO-size should result in about 1300x8=10400 memory bits. As the > blocksize of the embedded ram in Cyclone is 4096bits which can be organized > 512x8 I expect Quartus to use three M4K's resulting in 4096*3=12288bits. > Obviously it uses a fourth block, why? > > Regards, Manfred > ------ 8/19/03 3:17 PM > To Customer > Hello Manfred, > > This is to let you know that I am currently looking into this. I will let > you know as soon as I am able to verify the problem as you have described > and come into a resolution. > > ------ 8/19/03 4:20 PM > To Customer > Hello Manfred, > > Since 1300 is larger than 1k, it'll use 2kx2 mode for best performance. To > get the x8 mode you'll need 4 M4Ks. Click custom on (page 6 out of 8 of > the megawizard), then you get an option to set Maximum depth option and if > you set 512 then it'll use that mode and should only need 3 M4Ks. > > For more information on this, you may refer to the following link: > > http://www.altera.com/support/kdb/rd03292002_9305.html > > ------ 8/20/03 12:36 AM > From Customer > Hello Marlon, > > thanks for your quick and helpful reply. Now the behaviour of Quartus is > clear to me. > Unfortunately setting the parameter max. block depth to 512 in the > Megawizard Plug-In Manager as you proposed does not result in a smaller > memory consumption. I have attached the packed project for your > convenience. > Setting this parameter adds the following line in the scfifo instantiation > code: maximum_depth => 512, > however this parameter is not described in the Quartus II help page for the > scfifo-Megafunction. Why? > > Regards, Manfred > > ------ 8/20/03 9:47 AM > To Customer > Hello Manfred, > > The MAXIMUM_DEPTH parameter is an internal parameter so there won't be any > information on this in the Quartus II Help or Megawizard. > > ------ 8/20/03 11:26 PM > From Customer > Hello Marlon, > > again: Unfortunately setting the parameter max. block depth to 512 in the > Megawizard Plug-In Manager as you proposed does NOT result in a smaller > memory consumption. Why? Please check with the attached project file. > > Regards, Manfred > ------ 8/21/03 5:08 PM > To Customer > Hello Manfred, > > Sorry for the inconvenience, but actually, in order to get the x8 mode > you'll need 4 M4Ks. > > ------ 8/21/03 11:49 PM > From Customer > Hello Marlon, > > could you please specify why it is not possible to implement a 1300x8 FIFO > in 3 M4K Blocks as this information is the opposite of both your first > advice and the mentioned support database page > (http://www.altera.com/support/kdb/rd03292002_9305.html). > What exactly is the parameter maximal block depth for then? > > Regards, Manfred > > ------ 8/25/03 6:50 PM > To Customer > Hello Manfred, > > This is to let you know that a software problem request has been filed in > order to reflect this issue. I will let you know as soon the software > group gets back to me with any infomation or when a resolution is made.
sdatta@altera.com (Subroto Datta) wrote in message news:<ca4d800d.0311171211.14b76e97@posting.google.com>...
> petersommerfeld@hotmail.com (Peter Sommerfeld) wrote in message news:<5c4d983.0311170541.5bd0c1db@posting.google.com>... > > What does this generic means? > > > > I am wondering if I am missing out on a possible memory optimization. > > > > Altera's docs are decidedly vague and a search on their website brings up nothing. > > > > -- Pete > > MAXIMUM_DEPTH controls the underlying RAM block size that will be used > to construct the user's altsyncram memory. By default, the altsyncram > megafunction will round up the memory depth to the next power-of-2, > and use that as a RAM block size. For example, if you ask for a > 3K-word memory, altsyncram will normally construct it from 4K RAM > blocks, because this gives the best performance. If you are running > short of RAM blocks, you could specify MAXIMUM_DEPTH=1024 for this > example, and the altsyncram megafunction will construct the 3K memory > from 1K-word RAM blocks, which might potentially use 1/4 fewer RAM > blocks. The penalty for doing this is that the 3K-word memory > constructed from 1K-word RAM blocks will need LEs to mux and de-mux > the data, and will also run slower as a result. > > In summary, MAXIMUM_DEPTH is a control to increase memory efficiency > for non-power-of-2 memory depths, but at a cost of lower memory > performance, and a few LEs to stitch the smaller RAM blocks together. > MAXIMUM_DEPTH can only take power-of-2 values, with 32 being the > smallest meaningful value, since it corresponds to the shallowest M512 > memory block configuration. > > - Subroto Datta > Altera Corp.
Hi Manfred, Peter, The MAXIMUM_DEPTH description that was posted in my previous reply applies to the altsyncram megafunction, and indirectly to scfifo and dcfifo megafunctions. The FIFO megafunctions do not support non-power-of-2 depths, so the memory example I gave does not apply. In Quartus II 4.0, the FIFO MegaWizard plug-in will not allow you to enter non-power-of-2 depths. The only reason for specifying a MAXIMUM_DEPTH parameter in a FIFO megafunction in pre-4.0 versions of Quartus would be to enforce a smaller RAM block size to give added freedom to the fitter. MAXIMUM_DEPTH values of 128, 256, and 512 can fit in either M512 blocks or M4K blocks. A MAXIMUM_DEPTH value of 4096 can fit in either an M4K block or an M-RAM. Here's an example: I have a 2K word FIFO, and I don't care if it goes into M4K blocks or M512 blocks. If I set MAXIMUM_DEPTH=512, the FIFO will be constructed from 512-word RAM slices, which gives the fitter the flexibility to place the FIFOs in either M512 blocks or M4K blocks. - Subroto Datta Altera Corp.
Hi Subroto,

> The FIFO megafunctions do not support non-power-of-2 depths, so the > memory example I gave does not apply.
This is a very clean answer to a very long service request issue, it would have saved me a lot of time getting the very same answer from Altera mySupport. Instead they left me with a dangling service request and the information that there is a potential bug in Quartus. Do you have the possibility to look into that, or to share your knowledge with your support team? I would appreciate getting an official answer from mySuport, really closing my service request. BTW: Why do you restrict FIFO depths to powers of two? That would allow trading memory usage versus implementation speed (like with altsyncram). Regards, Manfred
Manfred M=FCcke wrote:

> BTW: Why do you restrict FIFO depths to powers of two? That would allow=
=20
> trading memory usage versus implementation speed (like with altsyncram)=
=2E Probably because FIFO storage is based on a ram, and ram comes in increments of one address bit. As Subroto said, the extra space from altsyncram MAXIMUM_DEPTH to the top could not be used as RAM in any case. -- Mike Treseler
>> BTW: Why do you restrict FIFO depths to powers of two? That would allow >> trading memory usage versus implementation speed (like with altsyncram).
> Probably because FIFO storage is based on a ram, > and ram comes in increments of one address bit.
True as long as the size of the memory/FIFO is smaller than the memory blocks available in the device. A Cyclone for example uses M4K memory blocks with 4096bit each (as the name suggests). So for RAM/FIFOs <4096bit you will always pay with a full M4K (as long as tey are implemented in memory blocks), but for RAM/FIFOs >4096bits the M4K-block is the smallest building unit, allowing you to implement a RAM/FIFO using 3*4096=12288bits from 3 M4K-blocks (depending on the FIFO width). Because address decoding is easier when aligning by depth an to improve speed, it can make sense to use more (four in our example) M4K-blocks wasting some memory, but it is by no ways a necessity. This is a limitation which does not apply to RAM but only to FIFOs and will be introduced in Quartus 4.0 as Subroto said. However RAM and FIFOs are both implemented in the very same memory blocks so it's up to the Wizard/Module Designer to allow or restrict the depth. It is a choice to restrict FIFO depths to powers of two but as long as there is no special FIFO-RAM block no must. My question was why this limitation which restricts potential savings on memory bit consumption will be introduced. Regards, Manfred
Followup to:  <opryznd7oggdoir8@news.inode.at>
By author:    =?iso-8859-15?Q?Manfred_M=FCcke?= <manfred.getmuecke@ridgmxof.thisat>
In newsgroup: comp.arch.fpga
> > True as long as the size of the memory/FIFO is smaller than the memory > blocks available in the device. A Cyclone for example uses M4K memory > blocks with 4096bit each (as the name suggests). So for RAM/FIFOs <4096bit > you will always pay with a full M4K (as long as tey are implemented in > memory blocks), but for RAM/FIFOs >4096bits the M4K-block is the smallest > building unit, allowing you to implement a RAM/FIFO using 3*4096=12288bits > from 3 M4K-blocks (depending on the FIFO width). Because address decoding > is easier when aligning by depth an to improve speed, it can make sense to > use more (four in our example) M4K-blocks wasting some memory, but it is by > no ways a necessity. >
There is another issue, which is that the RAMs are actually 4608 bits, not 4096. I have seen Quartus refuse to use those extra bits in situations where it could have, because it prefers to organize by depth, and apparently no way to work around this. I would really like to see: (a) support of non-power-of-two memory sizes; (b) ability to optimize for RAM consumption at the expense of timing. This in particular was an issue when I tried to create a 16384 x 9 bit ROM, and yes, I needed all 9 bits... -hpa -- <hpa@transmeta.com> at work, <hpa@zytor.com> in private! If you send me mail in HTML format I will assume it's spam. "Unix gives you enough rope to shoot yourself in the foot." Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
Hi Subroto,

I would like to renew my question: Why do you restrict FIFO depths to 
powers of two? I can't see the need for that.

Regards, Manfred