FPGARelated.com
Forums

How many Altera LE's to Xilinx Slices????

Started by Guitarman October 15, 2004
Hello All,

I've been designing with Xilinx FPGAs for a while so I'm used to the
"Slice" concept. I'm looking at Altera's Max II as a nice possible
solution for a design.

I took my VHDL code and it synthesized to 40 Slices in a Spartan III.
Then I took the same code and sythesized it for a Max II (using
Quartus II now) and it was 71 LE's.

I realize a blanket statement 71 LE's (approx. =) 40 Slices, is totaly
dependant on how the code is sysnthesized.

But is a approximate 1 Slice = 2 LE's a pretty close all around
estimate.

Thanks
Eric
Hi Eric,

> But is a approximate 1 Slice = 2 LE's a pretty close all around > estimate.
Give or take ~10% as a design-dependant margin and you should be OK. Best regards, Ben
Guitarman wrote:
> > Hello All, > > I've been designing with Xilinx FPGAs for a while so I'm used to the > "Slice" concept. I'm looking at Altera's Max II as a nice possible > solution for a design. > > I took my VHDL code and it synthesized to 40 Slices in a Spartan III. > Then I took the same code and sythesized it for a Max II (using > Quartus II now) and it was 71 LE's. > > I realize a blanket statement 71 LE's (approx. =) 40 Slices, is totaly > dependant on how the code is sysnthesized. > > But is a approximate 1 Slice = 2 LE's a pretty close all around > estimate.
The problem is not a hardware issue, but a granularity issue. Slices are not a good measure of how much logic your design is using. Slices have two LUTs and two FFs. If one FF is used, the slice is counted as used. You are better off determining how many LUTs and FFs are used in each design. They are much more comparable although there will be family dependant differences in how well the designs can pack into the larger granules. Mostly the newer parts will pack logic and FFs more densely than the older parts. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
Followup to:  <90282e35.0410151112.77a87654@posting.google.com>
By author:    ericjohnholland@hotmail.com (Guitarman)
In newsgroup: comp.arch.fpga
> > Hello All, > > I've been designing with Xilinx FPGAs for a while so I'm used to the > "Slice" concept. I'm looking at Altera's Max II as a nice possible > solution for a design. > > I took my VHDL code and it synthesized to 40 Slices in a Spartan III. > Then I took the same code and sythesized it for a Max II (using > Quartus II now) and it was 71 LE's. > > I realize a blanket statement 71 LE's (approx. =) 40 Slices, is totaly > dependant on how the code is sysnthesized. > > But is a approximate 1 Slice = 2 LE's a pretty close all around > estimate. >
Well, given that 1 slice = 2 LUTs + 2 FFs + some more logic, and 1 LE = 1 LUT + 1 FF + some more logic, it would be expected. -hpa
Hi Eric,

> I realize a blanket statement 71 LE's (approx. =) 40 Slices, is totaly > dependant on how the code is sysnthesized. > > But is a approximate 1 Slice = 2 LE's a pretty close all around > estimate.
Yes, that's a good 1st order estimate. We believe that 1 Slice is equal to about 1.8 LEs based on average results across a suite of designs, but mileage will vary from design to design -- this lines up well with your result though. One thing you should do is ensure that the CAD tool is trying to use as few LEs (and slices for Xilinx) as possible. When you are not filling up the device, Quartus will not try too hard to put LUTs and FFs into the same LE -- if there's any chance it will hurt rather than help timing, it will avoid it. When you start filling the device close to capacity, Quartus will try to pack more aggressively. This is the default "auto" setting for register packing. To artificially force Quartus to pack as aggressively as possible into LEs, go to the menu Assignments/Settings... select the Fitter Settings tab, and click the "More Settings..." button. There is a setting called "Auto Packed Registers -- Max II". Setting this to Minimize Area w/Chains will cause the most aggressive packing. Also, under the Analysis & Synthesis Settings tab, you can try out the "area" optimization technique which heuristically cares more about area than delay, though doesn't always necessarily reduce LE count. Regards, Paul Leventis Altera Corp.
"Guitarman" <ericjohnholland@hotmail.com> a &#4294967295;crit dans le message de
news:90282e35.0410151112.77a87654@posting.google.com...
> Hello All, > > I've been designing with Xilinx FPGAs for a while so I'm used to the > "Slice" concept. I'm looking at Altera's Max II as a nice possible > solution for a design. > > I took my VHDL code and it synthesized to 40 Slices in a Spartan III. > Then I took the same code and sythesized it for a Max II (using > Quartus II now) and it was 71 LE's. > > I realize a blanket statement 71 LE's (approx. =) 40 Slices, is totaly > dependant on how the code is sysnthesized. > > But is a approximate 1 Slice = 2 LE's a pretty close all around > estimate. > > Thanks > Eric
I disagree, both architectures are different, you can't compare it in this way have how many slices into the following code ? ..... DI : in std_logic; DO : out std_logic; CLOCK : in std_logic; ..... ....... signal temp: std_logic_vector(15 downto 0); ...... begin Demo : process(CLOCK) begin if rising_edge(CLOCK) then temp<= temp(14 downto 0) & DI; end if; end process Demo; DO <= temp(15); ....
>One thing you should do is ensure that the CAD tool is trying to use as few >LEs (and slices for Xilinx) as possible. When you are not filling up the >device, Quartus will not try too hard to put LUTs and FFs into the same >LE -- if there's any chance it will hurt rather than help timing, it will >avoid it. When you start filling the device close to capacity, Quartus will >try to pack more aggressively. This is the default "auto" setting for >register packing.
What would make the timing better if the LUT and FF are not packed in the same LE? I'm assuming that there is a very good path connecting the LUT/FF in the same LE because it is such a common case. What makes not using that faster? -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.
Hal Murray wrote:
> > >One thing you should do is ensure that the CAD tool is trying to use as few > >LEs (and slices for Xilinx) as possible. When you are not filling up the > >device, Quartus will not try too hard to put LUTs and FFs into the same > >LE -- if there's any chance it will hurt rather than help timing, it will > >avoid it. When you start filling the device close to capacity, Quartus will > >try to pack more aggressively. This is the default "auto" setting for > >register packing. > > What would make the timing better if the LUT and FF are not packed > in the same LE? > > I'm assuming that there is a very good path connecting the LUT/FF in > the same LE because it is such a common case. What makes not > using that faster?
He is not talking about a LUT and FF that are connected, he means ones that are separate. Like a FF with the D input connected to the output of another FF and a LUT that has its output going to another LUT only. Unless there is a shortage of IO in the LAB, they can share the same LE. Same thing in the Xilinx slice. Due to crowding of the routing, it may result in a faster design to keep them separate. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
Hi Hal, Rick:

> > What would make the timing better if the LUT and FF are not packed > > in the same LE? > He is not talking about a LUT and FF that are connected, he means ones > that are separate. Like a FF with the D input connected to the output > of another FF and a LUT that has its output going to another LUT only. > Unless there is a shortage of IO in the LAB, they can share the same > LE.
Rick's got it mostly right. The Stratix/Cyclone/Max II LE/ALMs can have a number of register/LUT pairings: 1. LUT feeds FF 2. FF feeds LUT 3. Unrelated FF and 3-input LUT 4. FF->FF connection from adjacent LE and a 4-input LUT (a register chain) For example, we could pack an 8-bit shift register in with 7 4-LUTs and 1 3-LUT to form 8 LEs. As Hal observed, it seems like doing #1 (or #2) is always a win. If you look at one FF, in our architecture we can choose to pack it with its fan-in (#1) or fan-out (#2). For example, if the critical path of the design is on the output of the FF, through only one of its LUTs, using packing #2 is the better choice for that flop. So there is an interesting optimization problem here. Some of the LEs created by #1 or #2 will have two seperate LE outputs (the Flop and the LUT) in the event that the FF/LUT connection is not single fanout. In theory, these multiple output LEs create a bit more routing pressure and so you may hurt timing more by making one than you do by bringing the FF and LUT together. But our routing architecture has been designed to tolerate aggressive packing. One way that using packing #1 or #2 can be sub-optimal is in the event where the flop really wants to be placed somewhere in-between all the things that it feeds and feeds it. Packing it with either source or destination might help one path, but hurt others more than if you just left the FF in a seperate LE and thus were free to move it where it wanted to be during placement. Now, when you look at #3, you must be intelligent in how you pack. If you take two unrelated functions that otherwise would want to be in opposite corners of the chip and put them together, you can hurt timing. Also, as Rick points out, LEs of this type will have 4 inputs and 2 outputs; if you make many of them you can start stressing the routing and this can lead to lower performance. Incidentally, this packing problem also arises on Stratix II when it comes to packing multiple functions into an ALM -- if they are unrelated, you must choose pairings wisely to not hurt performance. Packing #4 is particularly nasty from a CAD perspective. Creating these packings implies a group of LEs that must all be placed in the same LAB (register chain) and must move as a group. This further restricts placement and routing choices, and thus has the largest chance of being a net negative. But it can also help reduce the number of LEs in some designs. Note: The more your pack together into LEs, the closer in general you can place the LEs of a design, so doing these packings can also help performance :-)
>Same thing in the Xilinx slice. Due to crowding of the routing, it > may result in a faster design to keep them separate.
The trade-offs are likely different here. The VII slice has some FF packing capabilities. It can do #1, but #2 requires use of local routing (I think). It's not clear to me from the slice diagram whether packing #3 can be done. #4 is not possible. Also, I'm not sure how well the architecture responds to slices with multiple outputs (using the Y and Q outputs at the same time). If it was not architected for heavy use of both outputs, there could be more routing/performance trade-off here. This is all speculation. What I do know is when we compare half-slice vs. LE counts on a suite of designs, we find a ~9% advantage for Quartus + LEs over ISE + slices. We believe that the primary reason for this difference is the increased flop packing density available in the Altera LE. Regards, Paul Leventis Altera Corp.
In article <4170D026.5193E181@yahoo.com>, rickman  <john@bluepal.net> wrote:
>Hal Murray wrote: >> I'm assuming that there is a very good path connecting the LUT/FF in >> the same LE because it is such a common case. What makes not >> using that faster? > >He is not talking about a LUT and FF that are connected, he means ones >that are separate. Like a FF with the D input connected to the output >of another FF and a LUT that has its output going to another LUT only. >Unless there is a shortage of IO in the LAB, they can share the same >LE. Same thing in the Xilinx slice. Due to crowding of the routing, it >may result in a faster design to keep them separate.
Not just routing, but also placement: The separate pieces (FFs, LUTs etc) are not placed independantly, but are packed together and then placed. Thus if unrelated logic is packed together inappropriately, the placement for the packed component may be significantly worse than if each component was placed separately. -- Nicholas C. Weaver. to reply email to "nweaver" at the domain icsi.berkeley.edu