comp.arch.fpga | tri-state in altera| page 2

Reply by Jeff Cunningham ●June 3, 20042004-06-03

Austin Lesea wrote:

> The tristate buffers in Virtex and all subsequent families are in fact 
> separate bidirectional logic structures that simulate the behavior of a 
> tristate bus.

hmm.. I guess that means when using tri-states these days I don't need 
to worry about heating up a part by turning on multiple drivers fighting 
on a bus. It is simply a matter of data corruption?

Jeff

Reply by Austin Lesea ●June 3, 20042004-06-03

Jeff,

Yes.  No possibility of contention and a "X" value (unknown).  In fact 
that was a real challenge to simulate a "X" condition so that a user 
felt better.  Calling it a 0 or a 1 (which is what really results) and 
not even having a "z" condition (tri-state) made a few quite 
uncomfortable when simulating.  We had to emulate the tristate behavior 
in simulation runs.....yuch!

As to who did the "right" thing, Altera recognized early on that 
tristate muxes were hogs, and were slow, and didn't addict an entire 
generation to them with a successful product line that had them, whereas 
Xilinx has had to wean folks off of using them (in effect, break a bad 
habit) because we had a large number of users who used them, and liked 
them but they were inefficient and slower than using logic already there.

The perception of being efficient or not is an interesting one:  if we 
had dedicated more area to logic and less to tristate ciruits, which is 
more efficient?  Just another reason why you can argue just about any 
angle of FPGA architecture as being "good", or "bad".

Definitely a "glass half empty or glass half full" problem.  Not a whole 
lot to get excited about.

At the level most people design at now (VHDL or verilog) instantiating a 
tristate structure will be automatically get mapped to logic anyway (if 
you let it) or give you an error message (if you do not allow it and the 
target has no tristate blocks).

Austin

Jeff Cunningham wrote:
> Austin Lesea wrote:
> 
>> The tristate buffers in Virtex and all subsequent families are in fact 
>> separate bidirectional logic structures that simulate the behavior of 
>> a tristate bus.
> 
> 
> hmm.. I guess that means when using tri-states these days I don't need 
> to worry about heating up a part by turning on multiple drivers fighting 
> on a bus. It is simply a matter of data corruption?
> 
> Jeff
>

Reply by Marc Randolph ●June 3, 20042004-06-03

Symon wrote:
 > "qlyus" <qlyus@yahoo.com> wrote in message
 > news:da71446f.0406031036.137fd0db@posting.google.com...
 >
 >>Is Spartan 3 still faster and less expensive when there are 100+
 >>16/32-bit registers on a bus?
 >>
 >>-qlyus
 >>
 >>
> That's hardly a typical application. 

Perhaps we aren't typical, but we have done quite a few FPGA's that had
over 100 separate 16 bit (or more) control or status registers.  We try
to use BRAM's for stuff like this when it makes sense to, but most just
end up out in the sea of gates.

    Marc

Reply by Jan Gray ●June 3, 20042004-06-03

When you're trying to squeeze a pipelined RISC processor into a small tile
(say 4Rx6C of CLBs + 1 BRAM), (because you intend to tile dozens or hundreds
of processors per FPGA), and your result bus needs to mux amongst 4+
sources, and you have to burn several LUTs/bit just for lousy *muxes*, fer
gosh sakes, THEN you will shed a nostalgic tear for TBUFs passed (or other
non-LUT resources for wide horizontal muxes).

The xr16 profitably used a TBUF for every LUT site in the datapath.
[http://www.fpgacpu.org/xsoc/cc.html]

The loss isn't so bad once you learn the trick to implement
  o = a + b ? c;
or even
  o = mux(sel1, sel2){a + b, a - b, a & b, a ^ b};
in one LUT per bit. [http://www.fpgacpu.org/log/nov00.html#001112]

Jan Gray
Gray Research LLC

Reply by Jan Gray ●June 3, 20042004-06-03

I wrote:
>   o = a + b ? c;

Oops.  I meant:
  o = sel ? (a + b) : c;

Jan.

Reply by Ray Andraka ●June 4, 20042004-06-04

Jan,

Also, if you put one block Ram per processor, you get an area of at least 8x20
CLBs for each block RAM.  I don't miss the TBUFs as much as I thought I
would...most of the time.

Jan Gray wrote:

> When you're trying to squeeze a pipelined RISC processor into a small tile
> (say 4Rx6C of CLBs + 1 BRAM), (because you intend to tile dozens or hundreds
> of processors per FPGA), and your result bus needs to mux amongst 4+
> sources, and you have to burn several LUTs/bit just for lousy *muxes*, fer
> gosh sakes, THEN you will shed a nostalgic tear for TBUFs passed (or other
> non-LUT resources for wide horizontal muxes).
>
> The xr16 profitably used a TBUF for every LUT site in the datapath.
> [http://www.fpgacpu.org/xsoc/cc.html]
>
> The loss isn't so bad once you learn the trick to implement
>   o = a + b ? c;
> or even
>   o = mux(sel1, sel2){a + b, a - b, a & b, a ^ b};
> in one LUT per bit. [http://www.fpgacpu.org/log/nov00.html#001112]
>
> Jan Gray
> Gray Research LLC

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Reply by Ray Andraka ●June 4, 20042004-06-04

Oh yeah, one other trick that sometimes helps.  If yor resets are available on
flip-flops leading into a 4:1 mux, you can use the resets as a select so that the
mux reduces to a 4 input OR.  Sometimes works for pipelined stuff, but probably
not good for your processor.

Ray Andraka wrote:

> Jan,
>
> Also, if you put one block Ram per processor, you get an area of at least 8x20
> CLBs for each block RAM.  I don't miss the TBUFs as much as I thought I
> would...most of the time.
>
> Jan Gray wrote:
>
> > When you're trying to squeeze a pipelined RISC processor into a small tile
> > (say 4Rx6C of CLBs + 1 BRAM), (because you intend to tile dozens or hundreds
> > of processors per FPGA), and your result bus needs to mux amongst 4+
> > sources, and you have to burn several LUTs/bit just for lousy *muxes*, fer
> > gosh sakes, THEN you will shed a nostalgic tear for TBUFs passed (or other
> > non-LUT resources for wide horizontal muxes).
> >
> > The xr16 profitably used a TBUF for every LUT site in the datapath.
> > [http://www.fpgacpu.org/xsoc/cc.html]
> >
> > The loss isn't so bad once you learn the trick to implement
> >   o = a + b ? c;
> > or even
> >   o = mux(sel1, sel2){a + b, a - b, a & b, a ^ b};
> > in one LUT per bit. [http://www.fpgacpu.org/log/nov00.html#001112]
> >
> > Jan Gray
> > Gray Research LLC
>
> --
> --Ray Andraka, P.E.
> President, the Andraka Consulting Group, Inc.
> 401/884-7930     Fax 401/884-7950
> email ray@andraka.com
> http://www.andraka.com
>
>  "They that give up essential liberty to obtain a little
>   temporary safety deserve neither liberty nor safety."
>                                           -Benjamin Franklin, 1759

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759

Reply by Jan Gray ●June 4, 20042004-06-04

"Ray Andraka" <ray@andraka.com> wrote in message
news:40C067CE.C332F829@andraka.com...
> Also, if you put one block Ram per processor, you get an area of at least
8x20
> CLBs for each block RAM.  I don't miss the TBUFs as much as I thought I
> would...most of the time.

Ray, there are (not uncoincidentally) 4Rx6C of CLB / BRAM+mult in Virtex-II
Pro devices, yes?  And up to 444 BRAMs per device? :-)

Also, for good old XCV600E, (NB half as many slices per CLB), I used 8Rx6C
per processor, floorplanning 60 16-bit CPU + BRAM tiles or 36 32-bit CPUs +
2 BRAM tiles. [http://www.fpgacpu.org/log/mar02.html#020302]

TBUFs R.I.P.

Jan Gray
Gray Research LLC

Reply by Hal Murray ●June 4, 20042004-06-04

>Also, if you put one block Ram per processor, you get an area of at least 8x20
>CLBs for each block RAM.  I don't miss the TBUFs as much as I thought I
>would...most of the time.

One interesting aspect of the TBUFs is that they went onto long lines,
which were, well, long.  That helped simplify floor planning.

Assume that I have a design in mind where I would have used TBUFs.
Is there some layout pattern that works well after I switch to
using MUXes?  Do I just toss it on the chip in some sensible
looking way and assume the routing will be good enough?  What if
I'm pushing the speed or density envelope?

I guess I'm slightly surprised that some quirky feature hasn't
evolved to replace that nitch - something like a 2:1 mux or 2 input
OR tied to special routing.  (with a pitch to match an adder
using the dedicated carry logic)  Maybe the routing is just good
enough for the old type of design and newer chips are big enough
so that the typical design is a different sort of project.

-- 
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's.  I hate spam.

Reply by rickman ●June 4, 20042004-06-04

Hal Murray wrote:
> 
> >Also, if you put one block Ram per processor, you get an area of at least 8x20
> >CLBs for each block RAM.  I don't miss the TBUFs as much as I thought I
> >would...most of the time.
> 
> One interesting aspect of the TBUFs is that they went onto long lines,
> which were, well, long.  That helped simplify floor planning.
> 
> Assume that I have a design in mind where I would have used TBUFs.
> Is there some layout pattern that works well after I switch to
> using MUXes?  Do I just toss it on the chip in some sensible
> looking way and assume the routing will be good enough?  What if
> I'm pushing the speed or density envelope?
> 
> I guess I'm slightly surprised that some quirky feature hasn't
> evolved to replace that nitch - something like a 2:1 mux or 2 input
> OR tied to special routing.  (with a pitch to match an adder
> using the dedicated carry logic)  Maybe the routing is just good
> enough for the old type of design and newer chips are big enough
> so that the typical design is a different sort of project.

Keep in mind that the newer Xilinx chips have a MUXF6 which allow up to
8 input muxes to be made with a single level of delay.  That compares
well with the 16 input mux you can make from an Altera LAB.  Routing is
an issue, but the speed of the tbufs driving long lines make them pretty
impractical for the newer chips running at high speeds.  If you don't
need speed, you can use a single wire with a serial bus to reduce the
amount of logic and routing used.  What the newer chips provide is speed
and lots of it.  That can do a lot to reduce the size of a design.  

-- 

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

Previous 123 Next

tri-state in altera

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group