Reply by Peter Alfke June 11, 20042004-06-11
Let me correct this:
1.
Resistance by itself does not matter. The product of resistance times
capacitance matters.
2.
When the process is scaled down, the metal thickness really is not (or
hardly) reduced. It still is around a micron, as it has been for years.
 So, when all horizontal dimensions are cut in half, the metal traces
become half as wide and half as long, which means resistance is
constant. Yes, most metal traces are now much thicker than they are wide!
And it would appear that the capacitance of a half-width trace that is
half as long would be reduced 75%, and the RC product would thus be 4
times lower.

Reality is less benign, since the capacitive fringe effects take over,
and the sidewall capacitance and the trace-to-trace capacitance really increases.

Interconnect delays matter, but we can send a signal over quite a
distance in a single nanosecond. And on clock lines we can magically
eliminate the delay completely, using a DCM.

Peter Alfke

Phil Hays wrote:
> > (Hal Murray) wrote: > > <Snip> > >Suppose I design a FPGA with old fashioned tbufs and long lines, but > >don't cover the width of the whole chip, but just X LUT/FF units. > >Would that track other speed improvements as silicon gets faster? > > No, interconnect does not scale. Interconnect gets slower as the > device geometry gets smaller. > > Transistors scale. As they get smaller, the operating voltage > decreases and the switching speed increases. Nice, eh? > > Interconnect has a bulk resistivity set by the material. The end-to > end resistance is (resistivity*length)/(width*thickness). If the > ratios between length : width : thickness are constant, the resistance > doubles if the size halves. This is why interconnect was almost > ignorable at 3 micron geometry and is a major source of delay at .90 > micron, even after changing to copper with a lower bulk resistivity. > > -- > Phil Hays > Phil_hays at posting domain should work for email
Reply by Phil Hays June 11, 20042004-06-11
(Hal Murray) wrote:

<Snip>
>Suppose I design a FPGA with old fashioned tbufs and long lines, but >don't cover the width of the whole chip, but just X LUT/FF units. >Would that track other speed improvements as silicon gets faster?
No, interconnect does not scale. Interconnect gets slower as the device geometry gets smaller. Transistors scale. As they get smaller, the operating voltage decreases and the switching speed increases. Nice, eh? Interconnect has a bulk resistivity set by the material. The end-to end resistance is (resistivity*length)/(width*thickness). If the ratios between length : width : thickness are constant, the resistance doubles if the size halves. This is why interconnect was almost ignorable at 3 micron geometry and is a major source of delay at .90 micron, even after changing to copper with a lower bulk resistivity. -- Phil Hays Phil_hays at posting domain should work for email
Reply by Hal Murray June 10, 20042004-06-10
>Keep in mind that the newer Xilinx chips have a MUXF6 which allow up to >8 input muxes to be made with a single level of delay. That compares >well with the 16 input mux you can make from an Altera LAB. Routing is >an issue, but the speed of the tbufs driving long lines make them pretty >impractical for the newer chips running at high speeds. If you don't >need speed, you can use a single wire with a serial bus to reduce the >amount of logic and routing used. What the newer chips provide is speed >and lots of it. That can do a lot to reduce the size of a design.
I'm not sure that "lots of speed" translates into don't need tbufs. I'd expect that designers expectations and goals would grow to use all available resources - both space and time. Yes, if I'm using a modern/fast part to implement an old design, I may be able to make speed/space tradeoffs. But I could also be speeding up the whole project and expecting a state machine that used to run a X MHz to now run at 3X or 5X. (adjust your goals to match the age of your design) Is there something fundamentally evil with tbufs? Or is the problem that they don't scale because the chips are getting bigger (when measured in gates, not microns). Suppose I design a FPGA with old fashioned tbufs and long lines, but don't cover the width of the whole chip, but just X LUT/FF units. Would that track other speed improvements as silicon gets faster? -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.
Reply by Ray Andraka June 4, 20042004-06-04
Jan,

Of course!  I was thinking in terms of V2 not V2P.  Don't get to use the latter
as much as the former because of the nature of the clients I've been dealing
with (several space and military projects).

Jan Gray wrote:

> "Ray Andraka" <ray@andraka.com> wrote in message > news:40C067CE.C332F829@andraka.com... > > Also, if you put one block Ram per processor, you get an area of at least > 8x20 > > CLBs for each block RAM. I don't miss the TBUFs as much as I thought I > > would...most of the time. > > Ray, there are (not uncoincidentally) 4Rx6C of CLB / BRAM+mult in Virtex-II > Pro devices, yes? And up to 444 BRAMs per device? :-) > > Also, for good old XCV600E, (NB half as many slices per CLB), I used 8Rx6C > per processor, floorplanning 60 16-bit CPU + BRAM tiles or 36 32-bit CPUs + > 2 BRAM tiles. [http://www.fpgacpu.org/log/mar02.html#020302] > > TBUFs R.I.P. > > Jan Gray > Gray Research LLC
-- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759
Reply by Ray Andraka June 4, 20042004-06-04
The MuxF5's and MUXF6's have the wrong pitch to match up to arithmetic, which makes
them a pain in the tail to use on heavily arithmetic designs.   The mux pitch has
been a consistent complaint about the Virtex architecture.  Routingto them, as you
point out, is also an issue.

rickman wrote:

> > Keep in mind that the newer Xilinx chips have a MUXF6 which allow up to > 8 input muxes to be made with a single level of delay. That compares > well with the 16 input mux you can make from an Altera LAB. Routing is > an issue, but the speed of the tbufs driving long lines make them pretty > impractical for the newer chips running at high speeds. If you don't > need speed, you can use a single wire with a serial bus to reduce the > amount of logic and routing used. What the newer chips provide is speed > and lots of it. That can do a lot to reduce the size of a design. > > -- > > Rick "rickman" Collins > > rick.collins@XYarius.com > Ignore the reply address. To email me use the above address with the XY > removed. > > Arius - A Signal Processing Solutions Company > Specializing in DSP and FPGA design URL http://www.arius.com > 4 King Ave 301-682-7772 Voice > Frederick, MD 21701-3110 301-682-7666 FAX
-- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759
Reply by rickman June 4, 20042004-06-04
Hal Murray wrote:
> > >Also, if you put one block Ram per processor, you get an area of at least 8x20 > >CLBs for each block RAM. I don't miss the TBUFs as much as I thought I > >would...most of the time. > > One interesting aspect of the TBUFs is that they went onto long lines, > which were, well, long. That helped simplify floor planning. > > Assume that I have a design in mind where I would have used TBUFs. > Is there some layout pattern that works well after I switch to > using MUXes? Do I just toss it on the chip in some sensible > looking way and assume the routing will be good enough? What if > I'm pushing the speed or density envelope? > > I guess I'm slightly surprised that some quirky feature hasn't > evolved to replace that nitch - something like a 2:1 mux or 2 input > OR tied to special routing. (with a pitch to match an adder > using the dedicated carry logic) Maybe the routing is just good > enough for the old type of design and newer chips are big enough > so that the typical design is a different sort of project.
Keep in mind that the newer Xilinx chips have a MUXF6 which allow up to 8 input muxes to be made with a single level of delay. That compares well with the 16 input mux you can make from an Altera LAB. Routing is an issue, but the speed of the tbufs driving long lines make them pretty impractical for the newer chips running at high speeds. If you don't need speed, you can use a single wire with a serial bus to reduce the amount of logic and routing used. What the newer chips provide is speed and lots of it. That can do a lot to reduce the size of a design. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
Reply by Hal Murray June 4, 20042004-06-04
>Also, if you put one block Ram per processor, you get an area of at least 8x20 >CLBs for each block RAM. I don't miss the TBUFs as much as I thought I >would...most of the time.
One interesting aspect of the TBUFs is that they went onto long lines, which were, well, long. That helped simplify floor planning. Assume that I have a design in mind where I would have used TBUFs. Is there some layout pattern that works well after I switch to using MUXes? Do I just toss it on the chip in some sensible looking way and assume the routing will be good enough? What if I'm pushing the speed or density envelope? I guess I'm slightly surprised that some quirky feature hasn't evolved to replace that nitch - something like a 2:1 mux or 2 input OR tied to special routing. (with a pitch to match an adder using the dedicated carry logic) Maybe the routing is just good enough for the old type of design and newer chips are big enough so that the typical design is a different sort of project. -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.
Reply by Jan Gray June 4, 20042004-06-04
"Ray Andraka" <ray@andraka.com> wrote in message
news:40C067CE.C332F829@andraka.com...
> Also, if you put one block Ram per processor, you get an area of at least
8x20
> CLBs for each block RAM. I don't miss the TBUFs as much as I thought I > would...most of the time.
Ray, there are (not uncoincidentally) 4Rx6C of CLB / BRAM+mult in Virtex-II Pro devices, yes? And up to 444 BRAMs per device? :-) Also, for good old XCV600E, (NB half as many slices per CLB), I used 8Rx6C per processor, floorplanning 60 16-bit CPU + BRAM tiles or 36 32-bit CPUs + 2 BRAM tiles. [http://www.fpgacpu.org/log/mar02.html#020302] TBUFs R.I.P. Jan Gray Gray Research LLC
Reply by Ray Andraka June 4, 20042004-06-04
Oh yeah, one other trick that sometimes helps.  If yor resets are available on
flip-flops leading into a 4:1 mux, you can use the resets as a select so that the
mux reduces to a 4 input OR.  Sometimes works for pipelined stuff, but probably
not good for your processor.

Ray Andraka wrote:

> Jan, > > Also, if you put one block Ram per processor, you get an area of at least 8x20 > CLBs for each block RAM. I don't miss the TBUFs as much as I thought I > would...most of the time. > > Jan Gray wrote: > > > When you're trying to squeeze a pipelined RISC processor into a small tile > > (say 4Rx6C of CLBs + 1 BRAM), (because you intend to tile dozens or hundreds > > of processors per FPGA), and your result bus needs to mux amongst 4+ > > sources, and you have to burn several LUTs/bit just for lousy *muxes*, fer > > gosh sakes, THEN you will shed a nostalgic tear for TBUFs passed (or other > > non-LUT resources for wide horizontal muxes). > > > > The xr16 profitably used a TBUF for every LUT site in the datapath. > > [http://www.fpgacpu.org/xsoc/cc.html] > > > > The loss isn't so bad once you learn the trick to implement > > o = a + b ? c; > > or even > > o = mux(sel1, sel2){a + b, a - b, a & b, a ^ b}; > > in one LUT per bit. [http://www.fpgacpu.org/log/nov00.html#001112] > > > > Jan Gray > > Gray Research LLC > > -- > --Ray Andraka, P.E. > President, the Andraka Consulting Group, Inc. > 401/884-7930 Fax 401/884-7950 > email ray@andraka.com > http://www.andraka.com > > "They that give up essential liberty to obtain a little > temporary safety deserve neither liberty nor safety." > -Benjamin Franklin, 1759
-- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759
Reply by Ray Andraka June 4, 20042004-06-04
Jan,

Also, if you put one block Ram per processor, you get an area of at least 8x20
CLBs for each block RAM.  I don't miss the TBUFs as much as I thought I
would...most of the time.

Jan Gray wrote:

> When you're trying to squeeze a pipelined RISC processor into a small tile > (say 4Rx6C of CLBs + 1 BRAM), (because you intend to tile dozens or hundreds > of processors per FPGA), and your result bus needs to mux amongst 4+ > sources, and you have to burn several LUTs/bit just for lousy *muxes*, fer > gosh sakes, THEN you will shed a nostalgic tear for TBUFs passed (or other > non-LUT resources for wide horizontal muxes). > > The xr16 profitably used a TBUF for every LUT site in the datapath. > [http://www.fpgacpu.org/xsoc/cc.html] > > The loss isn't so bad once you learn the trick to implement > o = a + b ? c; > or even > o = mux(sel1, sel2){a + b, a - b, a & b, a ^ b}; > in one LUT per bit. [http://www.fpgacpu.org/log/nov00.html#001112] > > Jan Gray > Gray Research LLC
-- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759