Superscalar Out-of-Order Processor on an FPGA

Started by Luke May 9, 2006
Henry Wong wrote:
> Luke wrote: > > I've got a little hobby of mine in developing processors for FPGAs. > > I've designed several pipelined processors and a multicycle processor. > > > > I wonder how you manage to find enough time to do all of this! > > I've only found time during my undergraduate years to build two > processors, both with an excuse: > > A multicycle (>= 12!) processor for a 2nd-year digital logic project > (16-bit, 13MHz on Altera EPF10K70) > > A pipelined processor (no I/O though, so useless) built for clock speed > as a summer research project at my university. > (32-bit, 250MHz on Altera Stratix EP1S40) > > I haven't ever found enough time otherwise!
haha, I designed a soft core CPU as well. It is a 16-bit one and is quite slow, but I could have sped it up easily had I not used BlockRAM for registers (also used for internal stack) . I like having banked registers though for fast and effortless context switching (since I am writing the stuff in assembly). I plan on pipelining it once I get some apps up and running (need to make assembler). I think it is a result of good time management! I salut you Luke! I must also say as well, I never found delay slots to be a feature. I have ALWAYS considered them a bug. By any chance do you work for Hitachi? :P (Check out the SuperH series CPU's if you didnt get the joke :) ) -Isaac
Sorry it's "Renesas" now.

I designed a 2-way issue out-of-order processor recently.  It has 8
slot deep reorder buffers and 4 execute unites - 2 ALUs, 1 Branch and 1
L/S. Simple branch taken scheme.
It takes about 9000 LC(s) in a Xilinx V4 FX12 FPGA running 33 MHz with
1.2-1.5 IPC.

The major implementation issue is a 2W4R RF, I have to double clock the
write port.

"alpha" <zhg.liu@gmail.com> writes:
> I designed a 2-way issue out-of-order processor recently. It has 8 > slot deep reorder buffers and 4 execute unites - 2 ALUs, 1 Branch and 1 > L/S. Simple branch taken scheme. > It takes about 9000 LC(s) in a Xilinx V4 FX12 FPGA running 33 MHz with > 1.2-1.5 IPC.
Sounds like an interesting challenge, and perhaps useful if you plan to move it to an ASIC, but I wouldn't think it would be too useful in an FPGA, since it's not too difficult to make a single-issue processor run at more than 100 MHz in a V4 part. Are there any published HDL designs for out-of-order superscalar processors? Everything I've looked at has been single-issue in-order.
It's just for fun to run it in a FPGA rather than making an ASIC. I
spent almost all Saturdays in the past 1 year for this project.
I should say, it has lots challenge than a single-issue one. True,
clock speed is pretty low now now (30-60 Mhz). But I am planning to
enhance it by using a smarter compiler to removing some corner case
from the design. Hopefully I cam make it to 100Mhz range.
Also, looks I need to buy an evaluation kit from Altera. Xilinx V4
blcok ram gives me some trouble during instruction fetching.

My design is from scratch. Its instruction set is almost same as MIPS
3000 (without Multiplication). Lcc C compilier was ported.
I can publish the source verilog files, do we have public domain for
this purpose?

"alpha" <zhg.liu@gmail.com> writes:
> My design is from scratch. Its instruction set is almost same as MIPS > 3000 (without Multiplication). Lcc C compilier was ported. > I can publish the source verilog files, do we have public domain for > this purpose?
I'm not sure I understand your question. If you've written it yourself, you can certainly put it in the public domain if you so choose. Or you could release it under any of a number of existing "open source" licenses. If you're asking about places to distribute it, you could try submitting it as a project for www.opencores.org. If you don't find anywhere else, you could send it to me, and I'd be happy to put it on a web page for you. Eric
alpha wrote:
> It's just for fun to run it in a FPGA rather than making an ASIC. I > spent almost all Saturdays in the past 1 year for this project. > I should say, it has lots challenge than a single-issue one.
Very cool, congratulations.
> True, > clock speed is pretty low now now (30-60 Mhz). But I am planning to > enhance it by using a smarter compiler to removing some corner case > from the design.
How much work would it be to make it fully R3000 compatible (say, user-level only)?
> Hopefully I cam make it to 100Mhz range. > Also, looks I need to buy an evaluation kit from Altera. Xilinx V4 > block ram gives me some trouble during instruction fetching.
Care to elaborate?
> My design is from scratch. Its instruction set is almost same as MIPS > 3000 (without Multiplication). Lcc C compilier was ported. > I can publish the source verilog files, do we have public domain for > this purpose?
Picking a license (eg. GPL, LGPL, BSD, public domain, etc) is a separate issue from publishing (opencores, sourceforge, someones own web site [I'd be happy to host it for you fx.], etc). Just be careful about third party components (such as LCC) which come with their own licence. (IMHO: Out-of-Order on an FPGA is very cool, but will probably not lead to the best performance/LUT ratio - especially not if you can do compiler work also. For single threaded performance, Nios II and MicroBlaze are impressive, as is John Jakson's R16. A 2- or 4-way LIW may also make sense). Tommy
Tommy Thorn wrote:
> alpha wrote: > > It's just for fun to run it in a FPGA rather than making an ASIC. I > > spent almost all Saturdays in the past 1 year for this project. > > I should say, it has lots challenge than a single-issue one. > > Very cool, congratulations. > > > True, > > clock speed is pretty low now now (30-60 Mhz). But I am planning to > > enhance it by using a smarter compiler to removing some corner case > > from the design. > > How much work would it be to make it fully R3000 compatible (say, > user-level only)?
[I am not trying to make binary compatiable with R3000. No delay slot. Right now, total 24 instructions. No multiplication. Actully V4's DSP core can be used to do multiplication. Precise Interrupt works but not well tested, I had a timer interrupt. Seems still lots work to be done.]
> > > Hopefully I cam make it to 100Mhz range. > > Also, looks I need to buy an evaluation kit from Altera. Xilinx V4 > > block ram gives me some trouble during instruction fetching. > > Care to elaborate?
[The design was orignially tageted an Altera ApexII board, so the instruiction fetching unit fetch instruction from altera's async block memory and interface to reorder buffers. Then later on, I switched to bigger V4 chip. I have to play some trick to make it work with Xilinx's sync block memory. ]
> > My design is from scratch. Its instruction set is almost same as MIPS > > 3000 (without Multiplication). Lcc C compilier was ported. > > I can publish the source verilog files, do we have public domain for > > this purpose? > > Picking a license (eg. GPL, LGPL, BSD, public domain, etc) is a separate > issue from publishing (opencores, sourceforge, someones own web site > [I'd be happy to host it for you fx.], etc). Just be careful about > third party components (such as LCC) which come with their own licence. >
[I need think about it, thanks anyway. Of course, I do not want to get any trouble]
> > (IMHO: Out-of-Order on an FPGA is very cool, but will probably not lead > to the best performance/LUT ratio - especially not if you can do > compiler work also. For single threaded performance, Nios II and > MicroBlaze are impressive, as is John Jakson's R16. A 2- or 4-way LIW > may also make sense). >
[This design was started in Modelsim just for fun a year ago. The goal is to make a out-of-order superscalar processor. I even do not know if I can systhesis it for a FPGA. 9000 LE(s) is pretty big. 50Mhz * 1.5(IPC) = 75 MIPS. can not compete with Microblaze or Nios. But it is a good try. Agree, a simple 2 or 4 way static scheduled VLIW is really worth to try. ]
> Tommy
"alpha" <zhg.liu@gmail.com> writes:
> [I am not trying to make binary compatiable with R3000. No delay slot.
[...]
> [I need think about it, thanks anyway. Of course, I do not want to get > any trouble]
It appears that you've avoided the unaligned load/store instructions, which are patented. Between that and not actually being binary compatible, I don't think you're likely to get in any trouble if you choose to publish your design. Of course, I am not a lawyer. Eric
Eric Smith schrieb:
> "alpha" <zhg.liu@gmail.com> writes:
>>[I need think about it, thanks anyway. Of course, I do not want to get >>any trouble] > > > It appears that you've avoided the unaligned load/store instructions, > which are patented. Between that and not actually being binary > compatible, I don't think you're likely to get in any trouble if you > choose to publish your design. Of course, I am not a lawyer.
To begin with, it is controversial whether publishing a VHDL text that describes a patented method infringes the patent. A patent grants the monopoly for "making, using, selling, offering for sale, or importing the patented invention for the term of the patent". Implementing the vhdl in an FPGA probably would be "making". But just describing the invention in a formal way on your web page should be OK, as the whole point of the patent system is to publish the invention. Trade secrets and patents are mutually exclusive. Kolja Sulimma