Reply by September 6, 20132013-09-06
The standard data type (std_logic) is tri-statable in VHDL, so that would b=
e the preferred choice, rather than WAND or WOR. It does come in handy in t=
hat a single bidirectional port in RTL can represent both input and output =
wires, and part of the mux, at the gate level.

Tri-state bidirectional ports allow distributed address decoding in the RTL=
 (give a module the address and a generic to tell it what addresses to resp=
ond to), even though at the gate level it will all get optimized together a=
t the muxes.

Some synthesis tools can even "register" tri-state values to allow you to s=
implify pipelining in the RTL. Synthesis takes care of the details of separ=
ating out the tri-state enable from the data, and registering both appropri=
ately.

Andy
Reply by glen herrmannsfeldt September 5, 20132013-09-05
Mark Curry <gtwrek@sonic.net> wrote:

(snip, I wrote)
>>As far as I know, they are still implemented by the synthesis >>tools as either OR or AND logic. I don't know any reason to remove >>that ability, as it doesn't depend on the hardware. Then again, it >>isn't hard to write the logic directly.
> We do this now in verilog - declare our read data bus > (and similar signals) as "wor" nets. Then you can tie them > all together as needed. Saves you the hassle of actually > creating/managing individual return data, and muxxing it all.
> The individual modules must take care to drive 0's on the > read_data when not in use. Then you're really creating > multi-source signals (like past bus structures), but > relying on the "wor" to resolve the net.
I think you can also do it with traditional tri-state gates, but pretty much the same as AND with the enable, and then onto the WOR line.
> Works in Xilinx XST and Synplicity. Don't know about others. > Don't know if this trick would work in VHDL.
I can usually read VHDL but don't claim to write it. -- glen
Reply by Mark Curry September 5, 20132013-09-05
In article <l08jta$9m8$1@speranza.aioe.org>,
glen herrmannsfeldt  <gah@ugcs.caltech.edu> wrote:
>Brian Davis <brimdavis@aol.com> wrote: >> Gabor wrote: >(snip) >>> Modern FPGA's use buffered routing, and tristates don't >>> match up with that > >> I think I once read that the last generation or few of >> TBUF's were actually implemented with dedicated muxes/wired >> OR's, or something similar. > >As far as I know, they are still implemented by the synthesis >tools as either OR or AND logic. I don't know any reason to remove >that ability, as it doesn't depend on the hardware. Then again, it >isn't hard to write the logic directly.
We do this now in verilog - declare our read data bus (and similar signals) as "wor" nets. Then you can tie them all together as needed. Saves you the hassle of actually creating/managing individual return data, and muxxing it all. The individual modules must take care to drive 0's on the read_data when not in use. Then you're really creating multi-source signals (like past bus structures), but relying on the "wor" to resolve the net. Works in Xilinx XST and Synplicity. Don't know about others. Don't know if this trick would work in VHDL. --Mark
Reply by glen herrmannsfeldt September 4, 20132013-09-04
Brian Davis <brimdavis@aol.com> wrote:
> Gabor wrote:
(snip)
>> Modern FPGA's use buffered routing, and tristates don't >> match up with that
> I think I once read that the last generation or few of > TBUF's were actually implemented with dedicated muxes/wired > OR's, or something similar.
As far as I know, they are still implemented by the synthesis tools as either OR or AND logic. I don't know any reason to remove that ability, as it doesn't depend on the hardware. Then again, it isn't hard to write the logic directly. -- glen
Reply by Brian Davis September 4, 20132013-09-04
Gabor wrote:
> I think TBUFs went away along with "long lines" due to capacitive delay
I appreciate the rationale. Yet still I miss their functionality for processor designs. [ "Lament of the TBUF" would make an excellent dirge title ]
> Modern FPGA's use buffered routing, and tristates don't match up with that
I think I once read that the last generation or few of TBUF's were actually implemented with dedicated muxes/wired OR's, or something similar. I wish that had been continued on a reduced scale, TBUF's every 4 or 8 columns, matching the carry chain pitch, spanning some horizontal fraction of a clock region. -Brian
Reply by glen herrmannsfeldt September 4, 20132013-09-04
rickman <gnuarm@gmail.com> wrote:
>> " > > Yes, that is what we are discussing. Why did *Xilinx* give out the > family jewels to Lucent? We know it happened, the question is *why*?
(snip)
>> Yes, that's why I miss the TBUF's :)
>> In the XC4000/Virtex days, the same 32 bit core fit into >> 300-400 LUT4's, and a good number of TBUF's.
>> The growth to ~800 LUT4 is split between the TBUF >> replacement muxes and new instruction set features.
> My understanding is that TBUFs may have been a good idea when LUT delays > were 5 nS and routing was another 5 to 10 between LUTs, but as they made > the devices more dense and faster they found the TBUFs just didn't scale > in the same way, in fact the speed got worse! The capacitance being > driven didn't go down much and the TBUFs needed to scale which means > they had less drive. So they would have actually gotten slower. No, > they are gone because TBUFs just aren't your friend when you want to > make a dense, fast chip.
That is probably enough, but it is actually worse than that. At about 0.8 micron, the wiring has to use a distributed RC model. Above, you can treat it as driving a capacitor with a current source. All points are, close enough, the same voltage, and the only thing that matters is what that voltage is. (LC delay is pretty low.) Below 0.8 micron, and besides the fact that the lines are getting longer, the resistance is also significant. It is then modeled as series resistors and capacitors to ground, all the way down the line. (As well as I remember, the inductance is less singificant that resistance, but I haven't thought about it that closely for a while now.) -- glen
Reply by GaborSzakacs September 4, 20132013-09-04
rickman wrote:
[snip]
> My understanding is that TBUFs may have been a good idea when LUT delays > were 5 nS and routing was another 5 to 10 between LUTs, but as they made > the devices more dense and faster they found the TBUFs just didn't scale > in the same way, in fact the speed got worse! The capacitance being > driven didn't go down much and the TBUFs needed to scale which means > they had less drive. So they would have actually gotten slower. No, > they are gone because TBUFs just aren't your friend when you want to > make a dense, fast chip. >
I think TBUFs went away along with "long lines" due to capacitive delay as you noted. Modern FPGA's use buffered routing, and tristates don't match up with that sort of routing network since the buffered routes become unidirectional. The silicon for line drivers is now much faster than routing prop delays, making the buffered network faster than a single point driving all that line capacitance. So the new parts have drivers in every switch box instead of just pass FETs. I think the original Virtex line was the first to use buffered routing, part of the Dyna-Chip aquisition by Xilinx. They still had long lines and TBUFs, but that went away on Virtex 2. -- Gabor
Reply by rickman September 3, 20132013-09-03
On 9/3/2013 6:27 PM, Brian Davis wrote:
> rickman wrote: >> >>>> I'll never understand why they licensed their products to Lucent. >>> >>> I'd reckon AT&T/Lucent had a large semiconductor patent >>> portfolio with which to apply strategic "leverage" for a >>> favorable cross-licensing agreement. >> >> Possible, but I don't think so. Any number of folks could >> have had semiconductor patents and no one else got anything >> like this. I would speculate that Xilinx needed a second source >> > > There was definitely a second source in the XC3000 days, > first from MMI (bought by AMD), later AT&T; but I don't > remember there being anyone second sourcing the XC4000 > > IIRC, as Xilinx introduced the XC4000, AT&T went their > own way in the ORCA, with similar features (distributed RAM, > carry chains), but using the Neocad software. > > My speculation is that at this juncture, AT&T leveraged > rights to the Xilinx FPGA patents. > > Back in 1995, the AT&T press release responding to the > Neocad acquisition was re-posted here: > > https://groups.google.com/forum/message/raw?msg=comp.arch.fpga/Oa92_X3iDao/w63G0Z4dlCcJ > > and stated: > " > " When AT&T Microelectronics decided not to second source > " the Xilinx 4000 family of FPGAs, we accelerated the > " introduction of the ORCA family. > "
Yes, that is what we are discussing. Why did *Xilinx* give out the family jewels to Lucent? We know it happened, the question is *why*?
> ----------------- > >> The trick to datapaths in CPU designs is to minimize >> the number of inputs onto a "bus" which is implemented >> as multiplexers. > > Yes, that's why I miss the TBUF's :) > > In the XC4000/Virtex days, the same 32 bit core fit into > 300-400 LUT4's, and a good number of TBUF's. > > The growth to ~800 LUT4 is split between the TBUF > replacement muxes and new instruction set features.
My understanding is that TBUFs may have been a good idea when LUT delays were 5 nS and routing was another 5 to 10 between LUTs, but as they made the devices more dense and faster they found the TBUFs just didn't scale in the same way, in fact the speed got worse! The capacitance being driven didn't go down much and the TBUFs needed to scale which means they had less drive. So they would have actually gotten slower. No, they are gone because TBUFs just aren't your friend when you want to make a dense, fast chip.
>> Why did you roll your own RISC design when each FPGA >> maker has their own? > > When the YARD core blinked it's first LED in 1999, > there wasn't much in the way of free vendor RISC IP. > > Being a perpetually-unfinished spare-time project, > I never got enough loose ends tidied up enough to > make the sources available until recently.
Ok, that makes sense. I rolled my first CPU around 2002 and, like you, it may have been used, but still is not finished.
>> The Lattice version is even open source. >> > At the initial announcement, yes; but when I looked > a couple years ago, the Lattice Mico source files > had been lawyered up with a "Lattice Devices Only" > clause, see the comments on this thread: > > http://latticeblogs.typepad.com/frontier/2006/08/open_source.html
Oh, that is a horse of a different color. So the Lattice CPU designs are out! No big loss. The 8 bitter doesn't have a C compiler (not that I care) and good CPU designs are a dime a dozen... I guess, depending on your definition of "good". -- Rick
Reply by Brian Davis September 3, 20132013-09-03
rickman wrote:
> >>> I'll never understand why they licensed their products to Lucent. >> >> I'd reckon AT&T/Lucent had a large semiconductor patent >> portfolio with which to apply strategic "leverage" for a >> favorable cross-licensing agreement. > > Possible, but I don't think so. Any number of folks could > have had semiconductor patents and no one else got anything > like this. I would speculate that Xilinx needed a second source >
There was definitely a second source in the XC3000 days, first from MMI (bought by AMD), later AT&T; but I don't remember there being anyone second sourcing the XC4000 IIRC, as Xilinx introduced the XC4000, AT&T went their own way in the ORCA, with similar features (distributed RAM, carry chains), but using the Neocad software. My speculation is that at this juncture, AT&T leveraged rights to the Xilinx FPGA patents. Back in 1995, the AT&T press release responding to the Neocad acquisition was re-posted here: https://groups.google.com/forum/message/raw?msg=comp.arch.fpga/Oa92_X3iDao/w63G0Z4dlCcJ and stated: " " When AT&T Microelectronics decided not to second source " the Xilinx 4000 family of FPGAs, we accelerated the " introduction of the ORCA family. " -----------------
> The trick to datapaths in CPU designs is to minimize > the number of inputs onto a "bus" which is implemented > as multiplexers.
Yes, that's why I miss the TBUF's :) In the XC4000/Virtex days, the same 32 bit core fit into 300-400 LUT4's, and a good number of TBUF's. The growth to ~800 LUT4 is split between the TBUF replacement muxes and new instruction set features.
> Why did you roll your own RISC design when each FPGA > maker has their own?
When the YARD core blinked it's first LED in 1999, there wasn't much in the way of free vendor RISC IP. Being a perpetually-unfinished spare-time project, I never got enough loose ends tidied up enough to make the sources available until recently.
> > The Lattice version is even open source. >
At the initial announcement, yes; but when I looked a couple years ago, the Lattice Mico source files had been lawyered up with a "Lattice Devices Only" clause, see the comments on this thread: http://latticeblogs.typepad.com/frontier/2006/08/open_source.html -Brian
Reply by rickman September 3, 20132013-09-03
On 9/2/2013 9:56 PM, Brian Davis wrote:
> rickman wrote: > >> I have been looking at these parts for some time and I never >> realized they don't include distributed RAM using the LUTs. > > Also of note, the ICE40 Block RAM's two ports consist of > one read-only port, and one write-only port; vs. the two > independent read+write ports of many other FPGA families.
The iCE family of products have a number of shortcomings compared to the large parts sold elsewhere, but for a reason, the iCE lines are very, very low power. You can't do that if you have a lot of "fat" in the hardware. So they cut to the bone. This is not the only area where the parts are a little short. The question is how much does it matter? For a long time I've heard how brand X or A or whatever is better because of this feature or that feature. So the iCE line has few of these fancy features, how well do designs work in them?
>> Lattice has a license on many Xilinx owned patents because >> they bought the Orca line from Lucent who had gotten all >> sorts of licensing from Xilinx in a weak moment. > <snip> >> I'll never understand why they licensed their products to Lucent. > > I'd reckon AT&T/Lucent had a large semiconductor patent > portfolio with which to apply strategic "leverage" for a > favorable cross-licensing agreement.
Possible, but I don't think so. Any number of folks could have had semiconductor patents and no one else got anything like this. I would speculate that Xilinx needed a second source for some huge customer or maybe they were at a critical point in the company's growth and just needed a bunch of cash (as opposed to cache). Who knows?
>> If the processor were integrated into the FPGA, then we >> are back to a single simulation, schweet! > > As a yardstick, a system build for my homebrew RISC, > including 4 Kbyte BRAM, UART and I/O, fits snugly into > one of the 1280 LUT4 XO2 devices: > > : Number of logic LUT4s: 890 > : Number of distributed RAM: 66 (132 LUT4s) > : Number of ripple logic: 110 (220 LUT4s) > : Number of shift registers: 0 > : Total number of LUT4s: 1242 > : > : Number of block RAMs: 4 out of 7 (57%) > > The core proper (32 bit datapath, 16 bit instructions) > is currently ~800 LUT4 in its' default configuration. > [ I miss TBUF's when working on processor datapaths.] > > I don't have the XO2 design checked in, but the similar > XP2 version is in the following code repository, under > trunk/hdl/systems/evb_lattice_xp2_brevia : > > http://code.google.com/p/yard-1/ > > The above is still very much a work-in-progress, but > far enough along to use for small assembly projects > ( note that interrupts are currently broken ).
The trick to datapaths in CPU designs is to minimize the number of inputs onto a "bus" which is implemented as multiplexers. Minimizing inputs gains speed and minimizes logic. When possible put the muxes inside some RAM on the chip to good use. I got sidetracked on my last iteration of a CPU design which was going to use a block RAM as the "register file" and stack in one. Since then I've read about some other designs which use similar ideas although not identical. Why did you roll your own RISC design when each FPGA maker has their own? The Lattice version is even open source. -- Rick