FPGARelated.com
Forums

Beginning FPGA programming

Started by James Harris September 2, 2007
I'd like to try out some ideas and would appreciate some guidance.
Would a 200k-gate FPGA be enough for a simple or complex 8-bit CPU
design? I have this Digilent product:

   http://www.digilentinc.com/Products/Detail.cfm?Prod=NEXYS

but being totally new to hardware design I have some questions:

1. What language would be suitable - VHDL or Verilog? Or are there
others?

2. What description style would be appropriate? Or can I break the
design into modules, initially make each module with a high level
description and rewrite it at a lower level later as needed?

On 2 Sep, 23:40, James Harris <james.harri...@googlemail.com> wrote:
> I'd like to try out some ideas and would appreciate some guidance. > Would a 200k-gate FPGA be enough for a simple or complex 8-bit CPU > design? I have this Digilent product: > > http://www.digilentinc.com/Products/Detail.cfm?Prod=NEXYS > > but being totally new to hardware design I have some questions: > > 1. What language would be suitable - VHDL or Verilog? Or are there > others? > > 2. What description style would be appropriate? Or can I break the > design into modules, initially make each module with a high level > description and rewrite it at a lower level later as needed?
Should have said that part of the design is a large crossbar switch. It may be relevant to the number of gates and/or the design style. -- TIA, James
On 2007-09-02, James Harris <james.harris.1@googlemail.com> wrote:
> On 2 Sep, 23:40, James Harris <james.harri...@googlemail.com> wrote: >> I'd like to try out some ideas and would appreciate some guidance. >> Would a 200k-gate FPGA be enough for a simple or complex 8-bit CPU >> design? I have this Digilent product: >> >> http://www.digilentinc.com/Products/Detail.cfm?Prod=NEXYS >> >> but being totally new to hardware design I have some questions: >> >> 1. What language would be suitable - VHDL or Verilog? Or are there >> others? >> >> 2. What description style would be appropriate? Or can I break the >> design into modules, initially make each module with a high level >> description and rewrite it at a lower level later as needed? > > Should have said that part of the design is a large crossbar switch. > It may be relevant to the number of gates and/or the design style.
How large is "large"? But it should be fairly simply to calculate the size of a crossbar switch. Assuming the switch is implemented using muxes: A 2-to-1 mux uses 1 LUT A 4-to-1 mux uses 2 LUTs An 8-to-1 mux uses 4 LUTs A 16-to-1 mux uses 8 LUTs (and so on) Multiply this with the number of ports and the width of each port to get a rough total LUT cost. (Ignoring the cost of the arbiter or other configuration logic for the crossbar.) An XC3S200 has 3840 LUTs if I calculate correctly. So an 8x8 crossbar with 8 bit wide ports should fit fairly comfortable whereas for example a 16x16 crossbar of width 16 will consume half the FPGA. How often do you need to reconfigure the inputs/outputs of the crossbar? If it is not very often, perhaps you could serially load configuration data into SRL16 elements to reduce the number of required LUTs. What kind of bitrate do you need through the crossbar? Perhaps you could use a time-multiplexed bus instead? /Andreas
On 2 Sep, 23:44, Andreas Ehliar <ehl...@lysator.liu.se> wrote:
...
> > Should have said that part of the design is a large crossbar switch. > > It may be relevant to the number of gates and/or the design style. > > How large is "large"? But it should be fairly simply to calculate the > size of a crossbar switch.
Shooting from the hip somewhat I think I could start with about seven ports (to test the concept) each being 8-bit. I need to pass a strobe with each input to the switch and possibly an acknowledge fed /back/ from each output. So there would be 10 bits (8 data + 1 strobe + 1 ack) per port leading, I think, to a 70x70 crossbar.
> Assuming the switch is implemented using muxes: > A 2-to-1 mux uses 1 LUT > A 4-to-1 mux uses 2 LUTs > An 8-to-1 mux uses 4 LUTs > A 16-to-1 mux uses 8 LUTs > (and so on)
If it were implemented on bare silicon I think it could be made from one transistor or possibly a pair of transistors per junction. The main problem in that case would be routing of inputs, outputs and controls. For an FPGA I had no idea how it could be implemented. Thanks for the tip on using muxes. Do you mean each output would be fed from an 8-to-1 mux (to select the appropriate input line for each output)? So I would need 70 such muxes?
> Multiply this with the number of ports and the width of each port to > get a rough total LUT cost. (Ignoring the cost of the arbiter or > other configuration logic for the crossbar.) > > An XC3S200 has 3840 LUTs if I calculate correctly. So an 8x8 crossbar > with 8 bit wide ports should fit fairly comfortable whereas for example > a 16x16 crossbar of width 16 will consume half the FPGA. > > How often do you need to reconfigure the inputs/outputs of the > crossbar? If it is not very often, perhaps you could serially load > configuration data into SRL16 elements to reduce the number of > required LUTs.
The crossbar's mapping of inputs to outputs would change potentially every system 'cycle' though instead of using clock cycles I intend to use handshaking to sync transfers. This is part of the concept: 1) faster since compute elements do not have to wait for the next clock cycle in order to complete their work and 2) lower power consumtion because the system is not clocked. Someone will probably tell me this has already been done, though, or is not effective. Maybe I'm reinventing a wheel and one not as good as existing ones....
> What kind of bitrate do you need through the crossbar? Perhaps you > could use a time-multiplexed bus instead?
The rate isn't important at this stage. I want to test a concept rather than make a production system. Yes, one option is to have tri- state output latches on the elements and, rather than have them talk through a switch, have them talk over a couple of buses instead. I may need output latches anyway. I understand I can write this in VHDL or Verilog. Any suggestions on whether a newbie like me should use a particular style of description to implement the above? I guess I should avoid gate level to avoid too much complexity and also avoid high-level concepts so it can be readily synthesized. Is that about right? -- James
On 2007-09-03, James Harris <james.harris.1@googlemail.com> wrote:
> Thanks for the tip on using muxes.
I think muxes are your best bet.
> Do you mean each output would be fed from an 8-to-1 mux (to select the > appropriate input line for each output)? So I would need 70 such > muxes?
Yes. (Assuming 10 bits are enough for your design.) That shouldn't take too much space.
> cycle in order to complete their work and 2) lower power consumtion > because the system is not clocked. Someone will probably tell me this > has already been done, though, or is not effective. Maybe I'm > reinventing a wheel and one not as good as existing ones....
I don't remember any exact reference now but I remember reading postings on this group saying that it might be better to pipeline an FPGA design more than you might think since hazard spikes propagating through a long net might consume more power than the network itself. Something to keep in mind at least. I haven't tried to verify this myself though. Searching the group should yield some references which you might be interested in evaluating before you start working on this.
>> What kind of bitrate do you need through the crossbar? Perhaps you >> could use a time-multiplexed bus instead? > > The rate isn't important at this stage. I want to test a concept > rather than make a production system. Yes, one option is to have tri- > state output latches on the elements and, rather than have them talk > through a switch, have them talk over a couple of buses instead. I may > need output latches anyway.
Actually there are no tri-state buffers inside newer Xilinx FPGAs so even for the time-multiplexed bus you would use muxes. (Actually, most of the time when people talk about buses insides FPGAs the bus itself would be implemented using muxes nowadays.) Same thing with ASICs actually, you really want to avoid tri-state buses in an ASIC if you can. If you write a 'Z' in VHDL or Verilog it would be converted into logic that didn't use tri-state buffers. (I would recommend that you didn't use it anyway, in some cases you can get simulation/synthesis mis- match.) /Andreas
On 2007-09-03, Andreas Ehliar <ehliar@lysator.liu.se> wrote:
> Actually there are no tri-state buffers inside newer Xilinx FPGAs so > even for the time-multiplexed bus you would use muxes.
Upon thinking about this for a minute or so I realized that I should probably take that back and rephrase it like: "There are no user accessible tri-state buffer inside newer Xilinx FPGAs." In older devices you had TBUFs, but they are no longer available. Longer disclaimer: Since I don't work at Xilinx I have no idea how the routing inside Xilinx devices is actually implemented but it seems like some nets can be driven from either side depending on how the FPGA is configured. That might be implemented with some sort of tristate buffers. (Unless this is just a simplification in the FPGA editor of how it really works. The xdl report file also hints that there are some nets that can be driven in both directions.) Anyway, as a user of the device we don't really have to care about these details. /Andreas
Andreas Ehliar wrote:
(snip)

> Longer disclaimer: > Since I don't work at Xilinx I have no idea how the routing inside > Xilinx devices is actually implemented but it seems like some nets > can be driven from either side depending on how the FPGA is > configured. That might be implemented with some sort of tristate > buffers. (Unless this is just a simplification in the FPGA > editor of how it really works. The xdl report file also hints that > there are some nets that can be driven in both directions.) > Anyway, as a user of the device we don't really have to care about > these details.
As I understand it, it is related to the physics of the device. As the wires get narrower and longer, the resistance increases faster than the capacitance decreases. The solution is to put buffers along the line. As the buffers need a direction, that makes it hard to do true tri-state that the earlier Xilinx devices did. The solution is to use a mux, possibly driven by a priority encoder, maybe just using AND or OR onto the output line. It should work such that it gives the right result when only one is selected. -- glen
Hi,

Internal tristates are gone from Xilinx devices.

There is a way of implement efficient large muxes by using DFFs and the 
carry-chain.
The solution is using many DFFs but usually you use less DFFs than LUTs in a 
design.
You would let each source to mux passing through a DFF with a synchronous 
reset.
All DFFs are kept in reset state except the source that you have selected to 
mux.
This allows you to just OR all the sources since only the selected sources 
is not under reset.
The ORing can be done using carry-chain to even further decrease the LUT 
usage.
It's it in fact an AND-OR structure but the AND is coming from the 
synchronous reset in a DFF

Example.
16-bit busses and you need a 16-1 mux.
Using normal muxes would require 16*8  = 128 LUTs
With this solution you would need 4 LUTs for ORing 16 sources.
So the 16-1 mux would consume 16*4 = 64 LUTs and 16*16 = 256 DFFs.

So the DFFs usages is high but you have 50% less LUT usage.

G&#4294967295;ran

"glen herrmannsfeldt" <gah@ugcs.caltech.edu> wrote in message 
news:PJ-dnWy_lOX-dkHbnZ2dnUVZ_oOnnZ2d@comcast.com...
> Andreas Ehliar wrote: > (snip) > >> Longer disclaimer: >> Since I don't work at Xilinx I have no idea how the routing inside >> Xilinx devices is actually implemented but it seems like some nets >> can be driven from either side depending on how the FPGA is >> configured. That might be implemented with some sort of tristate >> buffers. (Unless this is just a simplification in the FPGA >> editor of how it really works. The xdl report file also hints that >> there are some nets that can be driven in both directions.) >> Anyway, as a user of the device we don't really have to care about >> these details. > > As I understand it, it is related to the physics of the device. > As the wires get narrower and longer, the resistance increases > faster than the capacitance decreases. The solution is to put > buffers along the line. As the buffers need a direction, that makes > it hard to do true tri-state that the earlier Xilinx devices did. > > The solution is to use a mux, possibly driven by a priority > encoder, maybe just using AND or OR onto the output line. It should > work such that it gives the right result when only one is selected. > > -- glen >
On Sep 3, 1:53 am, James Harris <james.harri...@googlemail.com> wrote:
> Shooting from the hip somewhat I think I could start with about seven > ports (to test the concept) each being 8-bit. I need to pass a strobe > with each input to the switch and possibly an acknowledge fed /back/ > from each output. So there would be 10 bits (8 data + 1 strobe + 1 > ack) per port leading, I think, to a 70x70 crossbar.
Isn't that a 7x7 crossbar with 10 bit data path , 3 bits of addressing to specify a port? total size would be 10*2*7=140 LUTs for crossbar port input mux tree. Much different problem than a 70x70 single bit data path with 7 bits of addressing. That problem requires 70*64=4,480 LUTs.
On Sep 2, 4:43 pm, James Harris <james.harri...@googlemail.com> wrote:
> Should have said that part of the design is a large crossbar switch. > It may be relevant to the number of gates and/or the design style.
should have asked, what's the data rate thru the cross bar per port? There are sometimes cheaper alternatives to a cross bar, such as message switching.