The standard data type (std_logic) is tri-statable in VHDL, so that would b=
e the preferred choice, rather than WAND or WOR. It does come in handy in t=
hat a single bidirectional port in RTL can represent both input and output =
wires, and part of the mux, at the gate level.

Tri-state bidirectional ports allow distributed address decoding in the RTL=
 (give a module the address and a generic to tell it what addresses to resp=
ond to), even though at the gate level it will all get optimized together a=
t the muxes.

Some synthesis tools can even "register" tri-state values to allow you to s=
implify pipelining in the RTL. Synthesis takes care of the details of separ=
ating out the tri-state enable from the data, and registering both appropri=
ately.

Andy

Mark Curry <gtwrek@sonic.net> wrote:

(snip, I wrote)
>>As far as I know, they are still implemented by the synthesis
>>tools as either OR or AND logic. I don't know any reason to remove
>>that ability, as it doesn't depend on the hardware. Then again, it
>>isn't hard to write the logic directly.
 
> We do this now in verilog - declare our read data bus 
> (and similar signals) as "wor" nets.  Then you can tie them 
> all together as needed.  Saves you the hassle of actually 
> creating/managing individual return data, and muxxing it all.
 
> The individual modules must take care to drive 0's on the 
> read_data when not in use.  Then you're really creating 
> multi-source signals (like past bus structures), but 
> relying on the "wor" to resolve the net.   

I think you can also do it with traditional tri-state gates,
but pretty much the same as AND with the enable, and then onto
the WOR line.
 
> Works in Xilinx XST and Synplicity.  Don't know about others.  
> Don't know if this trick would work in VHDL.

I can usually read VHDL but don't claim to write it.

-- glen

In article <l08jta$9m8$1@speranza.aioe.org>,
glen herrmannsfeldt  <gah@ugcs.caltech.edu> wrote:
>Brian Davis <brimdavis@aol.com> wrote:
>> Gabor wrote:
>(snip)
>>> Modern FPGA's use buffered routing, and tristates don't 
>>> match up with that 
> 
>> I think I once read that the last generation or few of 
>> TBUF's were actually implemented with dedicated muxes/wired 
>> OR's, or something similar.
>
>As far as I know, they are still implemented by the synthesis
>tools as either OR or AND logic. I don't know any reason to remove
>that ability, as it doesn't depend on the hardware. Then again, it
>isn't hard to write the logic directly.

We do this now in verilog - declare our read data bus (and similar signals) as
"wor" nets.  Then you can tie them all together as needed.  Saves you the
hassle of actually creating/managing individual return data, and muxxing it
all.

The individual modules must take care to drive 0's on the read_data when not
in use.  Then you're really creating multi-source signals (like past bus 
structures), but relying on the "wor" to resolve the net.   

Works in Xilinx XST and Synplicity.  Don't know about others.  Don't know if 
this trick would work in VHDL.

--Mark

Brian Davis <brimdavis@aol.com> wrote:
> Gabor wrote:
(snip)
>> Modern FPGA's use buffered routing, and tristates don't 
>> match up with that 

> I think I once read that the last generation or few of 
> TBUF's were actually implemented with dedicated muxes/wired 
> OR's, or something similar.

As far as I know, they are still implemented by the synthesis
tools as either OR or AND logic. I don't know any reason to remove
that ability, as it doesn't depend on the hardware. Then again, it
isn't hard to write the logic directly.

-- glen

Gabor wrote:
> I think TBUFs went away along with "long lines" due to capacitive delay 

I appreciate the rationale.
Yet still I miss their functionality for processor designs.
[ "Lament of the TBUF" would make an excellent dirge title ]

> Modern FPGA's use buffered routing, and tristates don't match up with that 

I think I once read that the last generation or few of TBUF's were actually implemented with dedicated muxes/wired OR's, or something similar.

I wish that had been continued on a reduced scale, TBUF's every 4 or 8 columns, matching the carry chain pitch, spanning some horizontal fraction of a clock region.

-Brian

rickman <gnuarm@gmail.com> wrote:
>> "
> 
> Yes, that is what we are discussing.  Why did *Xilinx* give out the 
> family jewels to Lucent?  We know it happened, the question is *why*?

(snip)
>> Yes, that's why I miss the TBUF's :)

>> In the XC4000/Virtex days, the same 32 bit core fit into
>> 300-400 LUT4's, and a good number of TBUF's.

>>   The growth to ~800 LUT4 is split between the TBUF
>> replacement muxes and new instruction set features.

> My understanding is that TBUFs may have been a good idea when LUT delays 
> were 5 nS and routing was another 5 to 10 between LUTs, but as they made 
> the devices more dense and faster they found the TBUFs just didn't scale 
> in the same way, in fact the speed got worse!  The capacitance being 
> driven didn't go down much and the TBUFs needed to scale which means 
> they had less drive.  So they would have actually gotten slower.  No, 
> they are gone because TBUFs just aren't your friend when you want to 
> make a dense, fast chip.

That is probably enough, but it is actually worse than that.

At about 0.8 micron, the wiring has to use a distributed RC model.

Above, you can treat it as driving a capacitor with a current source.
All points are, close enough, the same voltage, and the only thing
that matters is what that voltage is. (LC delay is pretty low.)

Below 0.8 micron, and besides the fact that the lines are getting
longer, the resistance is also significant. It is then modeled as
series resistors and capacitors to ground, all the way down the line.
(As well as I remember, the inductance is less singificant that
resistance, but I haven't thought about it that closely for
a while now.)

-- glen

rickman wrote:
[snip]
> My understanding is that TBUFs may have been a good idea when LUT delays 
> were 5 nS and routing was another 5 to 10 between LUTs, but as they made 
> the devices more dense and faster they found the TBUFs just didn't scale 
> in the same way, in fact the speed got worse!  The capacitance being 
> driven didn't go down much and the TBUFs needed to scale which means 
> they had less drive.  So they would have actually gotten slower.  No, 
> they are gone because TBUFs just aren't your friend when you want to 
> make a dense, fast chip.
> 

I think TBUFs went away along with "long lines" due to capacitive delay 
as you noted.  Modern FPGA's use buffered routing, and tristates don't
match up with that sort of routing network since the buffered routes
become unidirectional.  The silicon for line drivers is now much faster
than routing prop delays, making the buffered network faster than a
single point driving all that line capacitance.  So the new parts have
drivers in every switch box instead of just pass FETs.  I think the
original Virtex line was the first to use buffered routing, part of
the Dyna-Chip aquisition by Xilinx.  They still had long lines and
TBUFs, but that went away on Virtex 2.

-- 
Gabor

On 9/3/2013 6:27 PM, Brian Davis wrote:
> rickman wrote:
>>
>>>> I'll never understand why they licensed their products to Lucent.
>>>
>>>    I'd reckon AT&T/Lucent had a large semiconductor patent
>>> portfolio with which to apply strategic "leverage" for a
>>> favorable cross-licensing agreement.
>>
>> Possible, but I don't think so.  Any number of folks could
>> have had semiconductor patents and no one else got anything
>> like this. I would speculate that Xilinx needed a second source
>>
>
>   There was definitely a second source in the XC3000 days,
> first from MMI (bought by AMD), later AT&T; but I don't
> remember there being anyone second sourcing the XC4000
>
>   IIRC, as Xilinx introduced the XC4000, AT&T went their
> own way in the ORCA, with similar features (distributed RAM,
> carry chains), but using the Neocad software.
>
>   My speculation is that at this juncture, AT&T leveraged
> rights to the Xilinx FPGA patents.
>
>   Back in 1995, the AT&T press release responding to the
> Neocad acquisition was re-posted here:
>
> https://groups.google.com/forum/message/raw?msg=comp.arch.fpga/Oa92_X3iDao/w63G0Z4dlCcJ
>
> and stated:
> "
> " When AT&T Microelectronics decided not to second source
> " the Xilinx 4000 family of FPGAs, we accelerated the
> " introduction of the ORCA family.
> "

Yes, that is what we are discussing.  Why did *Xilinx* give out the 
family jewels to Lucent?  We know it happened, the question is *why*?


> -----------------
>
>> The trick to datapaths in CPU designs is to minimize
>> the number of inputs onto a "bus" which is implemented
>> as multiplexers.
>
> Yes, that's why I miss the TBUF's :)
>
> In the XC4000/Virtex days, the same 32 bit core fit into
> 300-400 LUT4's, and a good number of TBUF's.
>
>   The growth to ~800 LUT4 is split between the TBUF
> replacement muxes and new instruction set features.

My understanding is that TBUFs may have been a good idea when LUT delays 
were 5 nS and routing was another 5 to 10 between LUTs, but as they made 
the devices more dense and faster they found the TBUFs just didn't scale 
in the same way, in fact the speed got worse!  The capacitance being 
driven didn't go down much and the TBUFs needed to scale which means 
they had less drive.  So they would have actually gotten slower.  No, 
they are gone because TBUFs just aren't your friend when you want to 
make a dense, fast chip.


>> Why did you roll your own RISC design when each FPGA
>> maker has their own?
>
>   When the YARD core blinked it's first LED in 1999,
> there wasn't much in the way of free vendor RISC IP.
>
>   Being a perpetually-unfinished spare-time project,
> I never got enough loose ends tidied up enough to
> make the sources available until recently.

Ok, that makes sense.  I rolled my first CPU around 2002 and, like you, 
it may have been used, but still is not finished.


>> The Lattice version is even open source.
>>
> At the initial announcement, yes; but when I looked
> a couple years ago, the Lattice Mico source files
> had been lawyered up with a "Lattice Devices Only"
> clause, see the comments on this thread:
>
> http://latticeblogs.typepad.com/frontier/2006/08/open_source.html

Oh, that is a horse of a different color.  So the Lattice CPU designs 
are out!  No big loss.  The 8 bitter doesn't have a C compiler (not that 
I care) and good CPU designs are a dime a dozen... I guess, depending on 
your definition of "good".

-- 

Rick

rickman wrote:
>
>>> I'll never understand why they licensed their products to Lucent.
>>
>>   I'd reckon AT&T/Lucent had a large semiconductor patent
>> portfolio with which to apply strategic "leverage" for a
>> favorable cross-licensing agreement.
>
> Possible, but I don't think so.  Any number of folks could
> have had semiconductor patents and no one else got anything 
> like this. I would speculate that Xilinx needed a second source
>

 There was definitely a second source in the XC3000 days,
first from MMI (bought by AMD), later AT&T; but I don't 
remember there being anyone second sourcing the XC4000

 IIRC, as Xilinx introduced the XC4000, AT&T went their 
own way in the ORCA, with similar features (distributed RAM, 
carry chains), but using the Neocad software.

 My speculation is that at this juncture, AT&T leveraged
rights to the Xilinx FPGA patents.

 Back in 1995, the AT&T press release responding to the 
Neocad acquisition was re-posted here:

https://groups.google.com/forum/message/raw?msg=comp.arch.fpga/Oa92_X3iDao/w63G0Z4dlCcJ

and stated:
"
" When AT&T Microelectronics decided not to second source 
" the Xilinx 4000 family of FPGAs, we accelerated the 
" introduction of the ORCA family.
"

-----------------

> The trick to datapaths in CPU designs is to minimize 
> the number of inputs onto a "bus" which is implemented 
> as multiplexers.  

Yes, that's why I miss the TBUF's :)

In the XC4000/Virtex days, the same 32 bit core fit into 
300-400 LUT4's, and a good number of TBUF's.

 The growth to ~800 LUT4 is split between the TBUF 
replacement muxes and new instruction set features.

> Why did you roll your own RISC design when each FPGA 
> maker has their own?

 When the YARD core blinked it's first LED in 1999, 
there wasn't much in the way of free vendor RISC IP.

 Being a perpetually-unfinished spare-time project, 
I never got enough loose ends tidied up enough to 
make the sources available until recently.

>
> The Lattice version is even open source. 
>
At the initial announcement, yes; but when I looked 
a couple years ago, the Lattice Mico source files 
had been lawyered up with a "Lattice Devices Only" 
clause, see the comments on this thread:

http://latticeblogs.typepad.com/frontier/2006/08/open_source.html

-Brian

On 9/2/2013 9:56 PM, Brian Davis wrote:
> rickman wrote:
>
>> I have been looking at these parts for some time and I never
>> realized they don't include distributed RAM using the LUTs.
>
> Also of note, the ICE40 Block RAM's two ports consist of
> one read-only port, and one write-only port; vs. the two
> independent read+write ports of many other FPGA families.

The iCE family of products have a number of shortcomings compared to the 
large parts sold elsewhere, but for a reason, the iCE lines are very, 
very low power.  You can't do that if you have a lot of "fat" in the 
hardware.  So they cut to the bone.  This is not the only area where the 
parts are a little short.  The question is how much does it matter?  For 
a long time I've heard how brand X or A or whatever is better because of 
this feature or that feature.  So the iCE line has few of these fancy 
features, how well do designs work in them?

>> Lattice has a license on many Xilinx owned patents because
>> they bought the Orca line from Lucent who had gotten all
>> sorts of licensing from Xilinx in a weak moment.
> <snip>
>> I'll never understand why they licensed their products to Lucent.
>
>   I'd reckon AT&T/Lucent had a large semiconductor patent
> portfolio with which to apply strategic "leverage" for a
> favorable cross-licensing agreement.

Possible, but I don't think so.  Any number of folks could have had 
semiconductor patents and no one else got anything like this.  I would 
speculate that Xilinx needed a second source for some huge customer or 
maybe they were at a critical point in the company's growth and just 
needed a bunch of cash (as opposed to cache).  Who knows?

>> If the processor were integrated into the FPGA, then we
>> are back to a single simulation, schweet!
>
> As a yardstick, a system build for my homebrew RISC,
> including 4 Kbyte BRAM, UART and I/O, fits snugly into
> one of the 1280 LUT4 XO2 devices:
>
> :   Number of logic LUT4s:      890
> :   Number of distributed RAM:   66 (132 LUT4s)
> :   Number of ripple logic:     110 (220 LUT4s)
> :   Number of shift registers:    0
> :   Total number of LUT4s:     1242
> :
> :   Number of block RAMs:  4 out of 7 (57%)
>
>   The core proper (32 bit datapath, 16 bit instructions)
> is currently ~800 LUT4 in its' default configuration.
> [ I miss TBUF's when working on processor datapaths.]
>
> I don't have the XO2 design checked in, but the similar
> XP2 version is in the following code repository, under
> trunk/hdl/systems/evb_lattice_xp2_brevia :
>
>   http://code.google.com/p/yard-1/
>
> The above is still very much a work-in-progress, but
> far enough along to use for small assembly projects
> ( note that interrupts are currently broken ).

The trick to datapaths in CPU designs is to minimize the number of 
inputs onto a "bus" which is implemented as multiplexers.  Minimizing 
inputs gains speed and minimizes logic.  When possible put the muxes 
inside some RAM on the chip to good use.  I got sidetracked on my last 
iteration of a CPU design which was going to use a block RAM as the 
"register file" and stack in one.  Since then I've read about some other 
designs which use similar ideas although not identical.

Why did you roll your own RISC design when each FPGA maker has their 
own?  The Lattice version is even open source.

-- 

Rick