Reply by Walter Gallegos October 19, 20042004-10-19
>> Fire your synthesize tool and see how much resources you'd really need!
Yes, this is my point, Both structures have different resources, when write your code; your code stile make the difference. Walter. "Arash Salarian" <arash.salarian@epfl.ch> a &#4294967295;crit dans le message de news:417397f6$1@epflnews.epfl.ch...
> "Walter Gallegos" <walter@chasque.apc.org> wrote in message > news:10n2h60q87tig87@news.supernews.com... > > The answare is > > > > 1 slice into a Spartan 3 > > 16 LE into a MAX-II > > > > Can you compare this architectures as 1 Slice = 2 LE's ? > > > > I agree that there some areas that you can't simply compare the two > architectures. For example, I had an old design with an Altera 10K series > that used a fully async RAM block. Now, move it to a Spartan 3
architecture
> and you see that you should use the whole chip just to make that block of > async RAM! > However, it is perfectly understandable that a user might need to compare > different available options and to do this, he/she would need to have
rough
> estimates to compare a Xilinx device to that of Altera. For example, > recently I had this interesting offer for a an FPGA prototype board with > the same price of $99 for an Altern EP1C12 or a Xilinx XC3S400. I would
like
> to use a prototype board for very different designs so I had to compare > between the two chips. As I program in VHDL and use synthesize tools, I > don't really care for any specific architecture (unless something like
your
> example or my example above happens) and the thing that matters in cases > like that is you only look for the BIGGER FPGA. To do it, you need to > compare and to compare you can only use rough estimates. > Personally, I find the simple equation of 1 Slice = 2 LE a very good rough > estimate and for many designs it gives you a good answer. You have a very > specific design and need a very good answer? Fire your synthesize tool and > see how much resources you'd really need! > >
Reply by Ray Andraka October 18, 20042004-10-18
Yeah, me too.

glen herrmannsfeldt wrote:

> Ray Andraka wrote: > > > Depends heavily on the design. Xilinx packs tighter for certain > > arithmetic because of the structure of the LUT and carry chain: Altera's > > carry chain through stratix breaks the 4 lut into a pair of 3 LUTs, one > > for sum one for carry so it limits the number of inputs per bit. > > (snip) > > I still miss the XC4000 series where the carry chain was separate > from the LUTs, for convenient implementation of saturating adders > and MAX(a,b) functions by feeding the carry out or overflow > back to an LUT input. > > -- glen
-- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759
Reply by glen herrmannsfeldt October 18, 20042004-10-18

Ray Andraka wrote:

> Depends heavily on the design. Xilinx packs tighter for certain > arithmetic because of the structure of the LUT and carry chain: Altera's > carry chain through stratix breaks the 4 lut into a pair of 3 LUTs, one > for sum one for carry so it limits the number of inputs per bit.
(snip) I still miss the XC4000 series where the carry chain was separate from the LUTs, for convenient implementation of saturating adders and MAX(a,b) functions by feeding the carry out or overflow back to an LUT input. -- glen
Reply by Ray Andraka October 18, 20042004-10-18
Depends heavily on the design.  Xilinx packs tighter for certain
arithmetic because of the structure of the LUT and carry chain: Altera's
carry chain through stratix breaks the 4 lut into a pair of 3 LUTs, one
for sum one for carry so it limits the number of inputs per bit.  Stratix
adds a little bit of extra logic to the LE to allow implementation of an
adder subtractor without going to two levels of logic, and there is a way
to load data bypassing the adder which provides single level solutions for
those specific (and fairly common) cases.  Xilinx will also allow you to
turn the LE into a 16 element shift register, which can be very handy not
only for shift register delays, but also for reloadable LUTs, which are
useful for things like adaptive DA filters.  Altera has more options for
the memory structure, which in many cases makes it more efficient for
certain types of designs requiring memory.  My point is both vendor's
offerings have some strong points, and which one is best depends heavily
on your application.

Guitarman wrote:

> Hello All, > > I've been designing with Xilinx FPGAs for a while so I'm used to the > "Slice" concept. I'm looking at Altera's Max II as a nice possible > solution for a design. > > I took my VHDL code and it synthesized to 40 Slices in a Spartan III. > Then I took the same code and sythesized it for a Max II (using > Quartus II now) and it was 71 LE's. > > I realize a blanket statement 71 LE's (approx. =) 40 Slices, is totaly > dependant on how the code is sysnthesized. > > But is a approximate 1 Slice = 2 LE's a pretty close all around > estimate. > > Thanks > Eric
-- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759
Reply by Arash Salarian October 18, 20042004-10-18
"Walter Gallegos" <walter@chasque.apc.org> wrote in message 
news:10n2h60q87tig87@news.supernews.com...
> The answare is > > 1 slice into a Spartan 3 > 16 LE into a MAX-II > > Can you compare this architectures as 1 Slice = 2 LE's ? >
I agree that there some areas that you can't simply compare the two architectures. For example, I had an old design with an Altera 10K series that used a fully async RAM block. Now, move it to a Spartan 3 architecture and you see that you should use the whole chip just to make that block of async RAM! However, it is perfectly understandable that a user might need to compare different available options and to do this, he/she would need to have rough estimates to compare a Xilinx device to that of Altera. For example, recently I had this interesting offer for a an FPGA prototype board with the same price of $99 for an Altern EP1C12 or a Xilinx XC3S400. I would like to use a prototype board for very different designs so I had to compare between the two chips. As I program in VHDL and use synthesize tools, I don't really care for any specific architecture (unless something like your example or my example above happens) and the thing that matters in cases like that is you only look for the BIGGER FPGA. To do it, you need to compare and to compare you can only use rough estimates. Personally, I find the simple equation of 1 Slice = 2 LE a very good rough estimate and for many designs it gives you a good answer. You have a very specific design and need a very good answer? Fire your synthesize tool and see how much resources you'd really need!
Reply by Walter Gallegos October 16, 20042004-10-16
The answare is

      1 slice into a Spartan 3
    16 LE   into a MAX-II

Can you compare this architectures as  1 Slice = 2 LE's  ?

Walter.


"Walter Gallegos" <walter@chasque.apc.org> a &#4294967295;crit dans le message de
news:10n13v2dqalbv6a@news.supernews.com...
> > "Guitarman" <ericjohnholland@hotmail.com> a &#4294967295;crit dans le message de > news:90282e35.0410151112.77a87654@posting.google.com... > > Hello All, > > > > I've been designing with Xilinx FPGAs for a while so I'm used to the > > "Slice" concept. I'm looking at Altera's Max II as a nice possible > > solution for a design. > > > > I took my VHDL code and it synthesized to 40 Slices in a Spartan III. > > Then I took the same code and sythesized it for a Max II (using > > Quartus II now) and it was 71 LE's. > > > > I realize a blanket statement 71 LE's (approx. =) 40 Slices, is totaly > > dependant on how the code is sysnthesized. > > > > But is a approximate 1 Slice = 2 LE's a pretty close all around > > estimate. > > > > Thanks > > Eric > > I disagree, both architectures are different, you can't compare it in
this
> way > have how many slices into the following code ? > ..... > DI : in std_logic; > DO : out std_logic; > CLOCK : in std_logic; > ..... > ....... > signal temp: std_logic_vector(15 downto 0); > ...... > begin > > Demo : process(CLOCK) > begin > if rising_edge(CLOCK) then > temp<= temp(14 downto 0) & DI; > end if; > end process Demo; > > DO <= temp(15); > .... > > > >
Reply by Nicholas Weaver October 16, 20042004-10-16
In article <4170D026.5193E181@yahoo.com>, rickman  <john@bluepal.net> wrote:
>Hal Murray wrote: >> I'm assuming that there is a very good path connecting the LUT/FF in >> the same LE because it is such a common case. What makes not >> using that faster? > >He is not talking about a LUT and FF that are connected, he means ones >that are separate. Like a FF with the D input connected to the output >of another FF and a LUT that has its output going to another LUT only. >Unless there is a shortage of IO in the LAB, they can share the same >LE. Same thing in the Xilinx slice. Due to crowding of the routing, it >may result in a faster design to keep them separate.
Not just routing, but also placement: The separate pieces (FFs, LUTs etc) are not placed independantly, but are packed together and then placed. Thus if unrelated logic is packed together inappropriately, the placement for the packed component may be significantly worse than if each component was placed separately. -- Nicholas C. Weaver. to reply email to "nweaver" at the domain icsi.berkeley.edu
Reply by Paul Leventis (at home) October 16, 20042004-10-16
Hi Hal, Rick:

> > What would make the timing better if the LUT and FF are not packed > > in the same LE? > He is not talking about a LUT and FF that are connected, he means ones > that are separate. Like a FF with the D input connected to the output > of another FF and a LUT that has its output going to another LUT only. > Unless there is a shortage of IO in the LAB, they can share the same > LE.
Rick's got it mostly right. The Stratix/Cyclone/Max II LE/ALMs can have a number of register/LUT pairings: 1. LUT feeds FF 2. FF feeds LUT 3. Unrelated FF and 3-input LUT 4. FF->FF connection from adjacent LE and a 4-input LUT (a register chain) For example, we could pack an 8-bit shift register in with 7 4-LUTs and 1 3-LUT to form 8 LEs. As Hal observed, it seems like doing #1 (or #2) is always a win. If you look at one FF, in our architecture we can choose to pack it with its fan-in (#1) or fan-out (#2). For example, if the critical path of the design is on the output of the FF, through only one of its LUTs, using packing #2 is the better choice for that flop. So there is an interesting optimization problem here. Some of the LEs created by #1 or #2 will have two seperate LE outputs (the Flop and the LUT) in the event that the FF/LUT connection is not single fanout. In theory, these multiple output LEs create a bit more routing pressure and so you may hurt timing more by making one than you do by bringing the FF and LUT together. But our routing architecture has been designed to tolerate aggressive packing. One way that using packing #1 or #2 can be sub-optimal is in the event where the flop really wants to be placed somewhere in-between all the things that it feeds and feeds it. Packing it with either source or destination might help one path, but hurt others more than if you just left the FF in a seperate LE and thus were free to move it where it wanted to be during placement. Now, when you look at #3, you must be intelligent in how you pack. If you take two unrelated functions that otherwise would want to be in opposite corners of the chip and put them together, you can hurt timing. Also, as Rick points out, LEs of this type will have 4 inputs and 2 outputs; if you make many of them you can start stressing the routing and this can lead to lower performance. Incidentally, this packing problem also arises on Stratix II when it comes to packing multiple functions into an ALM -- if they are unrelated, you must choose pairings wisely to not hurt performance. Packing #4 is particularly nasty from a CAD perspective. Creating these packings implies a group of LEs that must all be placed in the same LAB (register chain) and must move as a group. This further restricts placement and routing choices, and thus has the largest chance of being a net negative. But it can also help reduce the number of LEs in some designs. Note: The more your pack together into LEs, the closer in general you can place the LEs of a design, so doing these packings can also help performance :-)
>Same thing in the Xilinx slice. Due to crowding of the routing, it > may result in a faster design to keep them separate.
The trade-offs are likely different here. The VII slice has some FF packing capabilities. It can do #1, but #2 requires use of local routing (I think). It's not clear to me from the slice diagram whether packing #3 can be done. #4 is not possible. Also, I'm not sure how well the architecture responds to slices with multiple outputs (using the Y and Q outputs at the same time). If it was not architected for heavy use of both outputs, there could be more routing/performance trade-off here. This is all speculation. What I do know is when we compare half-slice vs. LE counts on a suite of designs, we find a ~9% advantage for Quartus + LEs over ISE + slices. We believe that the primary reason for this difference is the increased flop packing density available in the Altera LE. Regards, Paul Leventis Altera Corp.
Reply by rickman October 16, 20042004-10-16
Hal Murray wrote:
> > >One thing you should do is ensure that the CAD tool is trying to use as few > >LEs (and slices for Xilinx) as possible. When you are not filling up the > >device, Quartus will not try too hard to put LUTs and FFs into the same > >LE -- if there's any chance it will hurt rather than help timing, it will > >avoid it. When you start filling the device close to capacity, Quartus will > >try to pack more aggressively. This is the default "auto" setting for > >register packing. > > What would make the timing better if the LUT and FF are not packed > in the same LE? > > I'm assuming that there is a very good path connecting the LUT/FF in > the same LE because it is such a common case. What makes not > using that faster?
He is not talking about a LUT and FF that are connected, he means ones that are separate. Like a FF with the D input connected to the output of another FF and a LUT that has its output going to another LUT only. Unless there is a shortage of IO in the LAB, they can share the same LE. Same thing in the Xilinx slice. Due to crowding of the routing, it may result in a faster design to keep them separate. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
Reply by Hal Murray October 16, 20042004-10-16
>One thing you should do is ensure that the CAD tool is trying to use as few >LEs (and slices for Xilinx) as possible. When you are not filling up the >device, Quartus will not try too hard to put LUTs and FFs into the same >LE -- if there's any chance it will hurt rather than help timing, it will >avoid it. When you start filling the device close to capacity, Quartus will >try to pack more aggressively. This is the default "auto" setting for >register packing.
What would make the timing better if the LUT and FF are not packed in the same LE? I'm assuming that there is a very good path connecting the LUT/FF in the same LE because it is such a common case. What makes not using that faster? -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.