comp.arch.fpga | Why the second flip-flop in Virtex-6?

Hello,

In case anyone hasn't already seen, Xilinx has some preliminary
information about Virtex-6 and Spartan-6 online here -
http://www.xilinx.com/products/v6s6.htm .

I do have a question about Virtex-6 and it's one LUT6/two flip-flop
architecture. I'm struggling to think of why a user would have any use
for that second flip-flop. It seems to me that the second flip-flop
only has use when the LUT6 is split into two LUT5's. However, the
family overview indicates that a dual-LUT5 has the same restriction as
in Virtex-5 - the inputs to the dual-LUT5s have to be the same. I know
in my designs I don't tend to get many LUT5s synthesized, so I'm not
sure how often that actually happens. The only other case I can even
think of is to use the second flip-flop as solely a storage element,
but without the ability to drive the clock enable input of the flop by
some sort of combinational signal (ie, an address decode) without
"spending" the associated LUT6, it's use seems very limited.

I am very cognizant of the fact that the people here and at Xilinx are
smarter than me. So, I figured that I'd give them a chance to explain
the design choice. I'm always interested in ways to use FPGA resources
more effectively.

Thanks!

- Nathan

Reply by rickman ●February 2, 20092009-02-02

Nathan Bialke wrote:
> Hello,
>
> In case anyone hasn't already seen, Xilinx has some preliminary
> information about Virtex-6 and Spartan-6 online here -
> http://www.xilinx.com/products/v6s6.htm .
>
> I do have a question about Virtex-6 and it's one LUT6/two flip-flop
> architecture. I'm struggling to think of why a user would have any use
> for that second flip-flop. It seems to me that the second flip-flop
> only has use when the LUT6 is split into two LUT5's. However, the
> family overview indicates that a dual-LUT5 has the same restriction as
> in Virtex-5 - the inputs to the dual-LUT5s have to be the same. I know
> in my designs I don't tend to get many LUT5s synthesized, so I'm not
> sure how often that actually happens. The only other case I can even
> think of is to use the second flip-flop as solely a storage element,
> but without the ability to drive the clock enable input of the flop by
> some sort of combinational signal (ie, an address decode) without
> "spending" the associated LUT6, it's use seems very limited.
>
> I am very cognizant of the fact that the people here and at Xilinx are
> smarter than me. So, I figured that I'd give them a chance to explain
> the design choice. I'm always interested in ways to use FPGA resources
> more effectively.

I haven't looked at the new Xilinx architecture, but it sounds like
the capabilities provided by shrinking geometries has reached the
point of saying bye-bye to the highly regarded LUT4.  We discussed
this recently here and it was mentioned that as geometries continue to
shrink, it will be come more advantageous to provide more capability
in the logic with relatively reduced routing.  By "relatively"
reduced, I mean they will not have as much routing per "gate
equivalent" as the logic cells get more complex.  It actually makes
sense to do this and I am a bit surprised that it has taken this long
to get to the LUT5/6 instead of the LUT4/5.  In the meantime they have
added, first memory blocks, then MACs and finally more functional DSP
units.  Along the way the I/O has been suped up with high speed
serdes, but that is not really related to the logic/routing mix.

The question of LUTn vs FF ratio is a tradeoff.  Different designs has
different needs.  It is not at all uncommon for comms designs to use
naked FFs for delay elements.  Other  designs might use only a
fraction as many FFs as LUTs although I don't know how this will
change with 6 LUTs.  It is very common for multiple 4 LUTs to feed a
single FF.  So it is not surprising that more FFs are included for a
given number of 6 LUTs.

So what other functional blocks will be coming along as the densities
continue to increase?  Some MCU devices provide complex serial I/O
units that can flexibly drive multiple serial I/O interface types.
Likewise they often provide widely capable timer functions.  These
specific functions may not have wide utility in FPGAs, but I expect
some types of generically capable logic other than LUTs, memory and
DSP blocks will identified and implemented.  I think that LUTs can
only go so far. The problem of selling the routing and giving the
logic for free limits how low the price can get and therefore use in
the highest volume applications.

Rick

Reply by glen herrmannsfeldt ●February 2, 20092009-02-02

Nathan Bialke <nathan@bialke.com> wrote:

> In case anyone hasn't already seen, Xilinx has some preliminary
> information about Virtex-6 and Spartan-6 online here -
> http://www.xilinx.com/products/v6s6.htm .

> I do have a question about Virtex-6 and it's one LUT6/two flip-flop
> architecture. I'm struggling to think of why a user would have any use
> for that second flip-flop. It seems to me that the second flip-flop
> only has use when the LUT6 is split into two LUT5's. However, the
> family overview indicates that a dual-LUT5 has the same restriction as
> in Virtex-5 - the inputs to the dual-LUT5s have to be the same. 

If the inputs to an LUT5 are the same, that doesn't mean the outputs
are the same.  I could easily imagine designs that would take five
inputs, put them through two different blocks of logic, and 
register the two outputs.

-- glen

Reply by Benjamin Couillard ●February 2, 20092009-02-02

FIY, to obtain the number of Logic cells in the spartan-6 device, you
multiply the number of slices by 6.4, even though there are only 4
LUTs in each slice. Basically, Xilinx expects each 6-input LUT to be
1.6 times more efficient than a 4-input LUT. Simple arithmetic tells
us that it's 1.5, I suppose that the people at Xilinx have their
reasons. Time will tell, if that comparison is valid, though I suppose
it depends on the kind of logic you use, whether or not can split your
logic in 2 5-input LUTs, etc. Plus only 25% of the slices will be able
to implement SRL32, don't know why, but it remains to be seen what
kind of impact this will have.

On the bright side, there are now integrated memeory controllers for
DDR, DDR-II and DDR-3, apparently up to 800 MHz DDR .

So far, it seems promising.

My 2 cents

Reply by -jg ●February 2, 20092009-02-02

Nathan Bialke wrote:

> Hello,
>
> In case anyone hasn't already seen, Xilinx has some preliminary
> information about Virtex-6 and Spartan-6 online here -
> http://www.xilinx.com/products/v6s6.htm .
>
> I do have a question about Virtex-6 and it's one LUT6/two flip-flop
> architecture. I'm struggling to think of why a user would have any use
> for that second flip-flop.

Hopefully the tools will be able to use it. :)

They have split to three slice types:
Slice M,L,X - and the most comprehensive has SRL32, 2 x SRL16, 2 x 32
bit RAM
- with those dual choices, it also points to a dual-output / dual flip-
flip cell being
needed to handle their outputs.

All sounds logical to me.

-jg

Reply by Kolja Sulimma ●February 2, 20092009-02-02

On 2 Feb., 21:33, Benjamin Couillard <benjamin.couill...@gmail.com>
wrote:
> FIY, to obtain the number of Logic cells in the spartan-6 device, you
> multiply the number of slices by 6.4, even though there are only 4
> LUTs in each slice. Basically, Xilinx expects each 6-input LUT to be
> 1.6 times more efficient than a 4-input LUT. Simple arithmetic tells
> us that it's 1.5,

How is that? Why should a 6 LUT be 1.5 times a 4 LUT?

If all functions would be equally likely to occur in a netlist than
the
ratio would be 4x.
(There are O(2**(2**N)) function of N inputs. It can be shown that at
least
half of those functions require at least O(2**N) gates to be
implemented).
Also, a 6-LUT is 4x as large as a 4-LUT.

On the other hand, adders are among the most common functions in real
world circuits, and for these even a 4-LUT is wasted area and a 6-LUT
gains nothing.

What you really need to compare is how many gates of a typical netlist
can be covered
by a single LUT on average.

The reason why it makes sense to spend 4x the logic area to cover only
1.6x the number
of gates is that in todays FPGAs most of the area is spent by routing.
So blowing up the
logic does not hurt the area a lot and actually makes routing simpler.

Kolja Sulimma

Reply by glen herrmannsfeldt ●February 2, 20092009-02-02

Kolja Sulimma <ksulimma@googlemail.com> wrote:
> On 2 Feb., 21:33, Benjamin Couillard <benjamin.couill...@gmail.com>
> wrote:
>> FIY, to obtain the number of Logic cells in the spartan-6 device, you
>> multiply the number of slices by 6.4, even though there are only 4
>> LUTs in each slice. Basically, Xilinx expects each 6-input LUT to be
>> 1.6 times more efficient than a 4-input LUT. Simple arithmetic tells
>> us that it's 1.5,
 
> What you really need to compare is how many gates of a typical netlist
> can be covered by a single LUT on average.

Well, the ratio of the number of 4LUTs to the number of 6LUTs to
cover an average netlist.  That makes it independent of the actual
'gate' count.  

-- glen

Reply by rickman ●February 2, 20092009-02-02

On Feb 2, 3:33 pm, Benjamin Couillard <benjamin.couill...@gmail.com>
wrote:
> FIY, to obtain the number of Logic cells in the spartan-6 device, you
> multiply the number of slices by 6.4, even though there are only 4
> LUTs in each slice. Basically, Xilinx expects each 6-input LUT to be
> 1.6 times more efficient than a 4-input LUT.

This is all too bizarre.  No other company in the world counts "logic
cells" or "slices".  No one but Xilinx knows what a "logic cell" is
and I don't know what a slice is in the Spartan 6 parts because I
can't open the Spartan 6 overview in Acrobat 6.  For whatever reason I
can't install Acrobat Reader 8 and Sumatra PDF seems to be considered
a virus by my AVS and it won't let it run.

Regardless, counting logic cells and slices is pointless.  Counting
LUT4s was a valid comparison because most FPGAs used them.  But with
the introduction of LUT6s in Altera and now Xilinx parts, the jig is
up and we can't compare across families with these logic elements.

Trying to compare parts with different LUT sizes (or different
mixtures of LUTs and FFs) depends too much on the details of your
design.  Different applications require different mixtures of logic
and FFs just like you can't compare processor speeds based on the CPU
MHz or the memory bandwidth.

> Simple arithmetic tells
> us that it's 1.5, I suppose that the people at Xilinx have their
> reasons.

You mean the marketing people?...

> Time will tell, if that comparison is valid, though I suppose
> it depends on the kind of logic you use, whether or not can split your
> logic in 2 5-input LUTs, etc. Plus only 25% of the slices will be able
> to implement SRL32, don't know why, but it remains to be seen what
> kind of impact this will have.

Using a LUT as an SRL requires added logic in the LUT.  No design will
use a large fraction of the LUTs as SRLs.  So why add the logic in all
LUTs?  The only real impact is in how to route to the SRLs that are
implemented.

> On the bright side, there are now integrated memeory controllers for
> DDR, DDR-II and DDR-3, apparently up to 800 MHz DDR .

Is that in the Virtex only or also the Spartan parts?

Rick

Reply by Benjamin Couillard ●February 2, 20092009-02-02

On 2 f=E9v, 21:59, rickman <gnu...@gmail.com> wrote:
> On Feb 2, 3:33 pm, Benjamin Couillard <benjamin.couill...@gmail.com>
> wrote:
>
> > FIY, to obtain the number of Logic cells in the spartan-6 device, you
> > multiply the number of slices by 6.4, even though there are only 4
> > LUTs in each slice. Basically, Xilinx expects each 6-input LUT to be
> > 1.6 times more efficient than a 4-input LUT.
>
> This is all too bizarre. =A0No other company in the world counts "logic
> cells" or "slices". =A0No one but Xilinx knows what a "logic cell" is
> and I don't know what a slice is in the Spartan 6 parts because I
> can't open the Spartan 6 overview in Acrobat 6. =A0For whatever reason I
> can't install Acrobat Reader 8 and Sumatra PDF seems to be considered
> a virus by my AVS and it won't let it run.

Yeah, of course, you can't really compare a 6-input LUT with a 4-input
LUT. Plus ALtera seems to have a more flexible 6-input LUT than
Xilinx, so I wonder how we can compare exactly 2 FPGAs from 2
differente companies.


>
> Regardless, counting logic cells and slices is pointless. =A0Counting
> LUT4s was a valid comparison because most FPGAs used them. =A0But with
> the introduction of LUT6s in Altera and now Xilinx parts, the jig is
> up and we can't compare across families with these logic elements.
>
> Trying to compare parts with different LUT sizes (or different
> mixtures of LUTs and FFs) depends too much on the details of your
> design. =A0Different applications require different mixtures of logic
> and FFs just like you can't compare processor speeds based on the CPU
> MHz or the memory bandwidth.
>
> > Simple arithmetic tells
> > us that it's 1.5, I suppose that the people at Xilinx have their
> > reasons.
>
> You mean the marketing people?...

Yeah,but I hope that they have benchmarks, like FFTs, memory
controllers, microblaze, etc. that will give us an idea of how
efficient is the new 6-input LUT.

>
> > Time will tell, if that comparison is valid, though I suppose
> > it depends on the kind of logic you use, whether or not can split your
> > logic in 2 5-input LUTs, etc. Plus only 25% of the slices will be able
> > to implement SRL32, don't know why, but it remains to be seen what
> > kind of impact this will have.
>
> Using a LUT as an SRL requires added logic in the LUT. =A0No design will
> use a large fraction of the LUTs as SRLs. =A0So why add the logic in all
> LUTs? =A0The only real impact is in how to route to the SRLs that are
> implemented.

Yeah, you're right I guess.

>
> > On the bright side, there are now integrated memeory controllers for
> > DDR, DDR-II and DDR-3, apparently up to 800 MHz DDR .
>
> Is that in the Virtex only or also the Spartan parts?
>
> Rick

Spartan-6 only, I suppose that they determined that the cost of adding
embedded memory controllers was worth it. Furthermore, it seems to be
a multiport memory controller, so the wrapping needed might be
minimal. Of course, time will tell.

Reply by David Brown ●February 3, 20092009-02-03

Nathan Bialke wrote:
> Hello,
> 
> In case anyone hasn't already seen, Xilinx has some preliminary
> information about Virtex-6 and Spartan-6 online here -
> http://www.xilinx.com/products/v6s6.htm .
> 
> I do have a question about Virtex-6 and it's one LUT6/two flip-flop
> architecture. I'm struggling to think of why a user would have any use
> for that second flip-flop. It seems to me that the second flip-flop
> only has use when the LUT6 is split into two LUT5's. However, the
> family overview indicates that a dual-LUT5 has the same restriction as
> in Virtex-5 - the inputs to the dual-LUT5s have to be the same. I know
> in my designs I don't tend to get many LUT5s synthesized, so I'm not
> sure how often that actually happens. The only other case I can even
> think of is to use the second flip-flop as solely a storage element,
> but without the ability to drive the clock enable input of the flop by
> some sort of combinational signal (ie, an address decode) without
> "spending" the associated LUT6, it's use seems very limited.
> 
> I am very cognizant of the fact that the people here and at Xilinx are
> smarter than me. So, I figured that I'd give them a chance to explain
> the design choice. I'm always interested in ways to use FPGA resources
> more effectively.
> 
> Thanks!
> 

Perhaps it is something to do with Altera having used a larger LUT 
connected to two FF's for some time now (since the Stratix II, but not 
on the Cyclones).  Altera's is a bit more advanced (8-input LUT that can 
be split in many ways, but still with a maximum full LUT of 6 inputs), 
but it sounds a little "me too" from Xilinx.  Of course, if it really 
does give faster or more compact designs, then "me too" is the right move!

Previous12 3 4 Next

Why the second flip-flop in Virtex-6?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group