FPGARelated.com
Forums

Documenting a simple CPU

Started by Jonathan Bromley March 18, 2009
Some of us have been a little frustrated by the 
hard-to-follow documentation of the nibz CPU.
I thought it might be useful, as a comparison,
to expose to public ridicule the documentation
for a toy CPU I developed as fodder for some of
our training courses.  It doesn't yet have an
assembler (sorry Antti!!!) but I hope you will
agree that the docs are complete enough for you 
to write one should you so wish.

I'm not trying to stir up interest in this CPU
design, even though I think it's quite cute; 
there are far too many RISC soft-cores out there
already.  I just wanted to give an example of how
you might go about documenting such a thing without
putting in too much effort.  Its instruction set
is roughly of the same complexity as nibz, I think.

http://www.oxfordbromley.plus.com/files/miniCPU/arch.pdf
-- 
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan.bromley@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.
On Mar 18, 3:29=A0pm, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> Some of us have been a little frustrated by the > hard-to-follow documentation of the nibz CPU. > I thought it might be useful, as a comparison, > to expose to public ridicule the documentation > for a toy CPU I developed as fodder for some of > our training courses. =A0It doesn't yet have an > assembler (sorry Antti!!!) but I hope you will > agree that the docs are complete enough for you > to write one should you so wish. > > I'm not trying to stir up interest in this CPU > design, even though I think it's quite cute; > there are far too many RISC soft-cores out there > already. =A0I just wanted to give an example of how > you might go about documenting such a thing without > putting in too much effort. =A0Its instruction set > is roughly of the same complexity as nibz, I think. > > http://www.oxfordbromley.plus.com/files/miniCPU/arch.pdf > -- > Jonathan Bromley, Consultant > > DOULOS - Developing Design Know-how > VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services > > Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK > jonathan.brom...@MYCOMPANY.comhttp://www.MYCOMPANY.com > > The contents of this message may contain personal views which > are not the views of Doulos Ltd., unless specifically stated.
Hi it is defenetly interesting already! because it has human readable documentation and creating an assembler based on the docu can be accomplished in less than an hour Antti
Or you could use something like cgen (http://sourceware.org/cgen/),
then you can have your assembler and simulator generated automatically
for you (no reason why this couldn't also be extended to generating
GCC backend and prototype HDL too).

Jon
On Mar 18, 9:29=A0am, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> Some of us have been a little frustrated by the > hard-to-follow documentation of the nibz CPU. > I thought it might be useful, as a comparison, > to expose to public ridicule the documentation > for a toy CPU I developed as fodder for some of > our training courses. =A0It doesn't yet have an > assembler (sorry Antti!!!) but I hope you will > agree that the docs are complete enough for you > to write one should you so wish. > > I'm not trying to stir up interest in this CPU > design, even though I think it's quite cute; > there are far too many RISC soft-cores out there > already. =A0I just wanted to give an example of how > you might go about documenting such a thing without > putting in too much effort. =A0Its instruction set > is roughly of the same complexity as nibz, I think. > > http://www.oxfordbromley.plus.com/files/miniCPU/arch.pdf
I think this is a very interesting processor spec. It is very simple, yet very functional. I can see many applications for it. A comment... The logical shift left operation is not really needed unless I am mistaken. Since the opcode is free to specify any combination of operands, you can use the same value for operand A and B with an ADD operation which will result in a left shift. Have any of the students noticed this? Rick
Jonathan Bromley wrote:

> Some of us have been a little frustrated by the > hard-to-follow documentation of the nibz CPU. > I thought it might be useful, as a comparison, > to expose to public ridicule the documentation > for a toy CPU I developed as fodder for some of > our training courses. It doesn't yet have an > assembler (sorry Antti!!!) but I hope you will > agree that the docs are complete enough for you > to write one should you so wish.
Err, with no assembler, how do you run the compiled cpu ? Add one yourself to this ? AS Assembler ? http://john.ccac.rwth-aachen.de:8000/as/download.html Good docs, but missing is a resource report ? Given the 4+year time line, perhaps a couple of target FPGAs at opposite ends of that time-frame ? I see it's a 3 operand design, and 16b opcodes. More natural for FPGA is 18 b opcodes ? - and perhaps a register frame pointer, so larger Block Ram can be better accessed ?. perhaps r0 ?, as allocating that to 0 for ALL opcodes seems a tad wasteful. FPGAs give you very good 'free' ram resource, so the best SoftCPUs start from RAM size, and work backwards. -jg
On Thu, 19 Mar 2009 01:14:31 -0700 (PDT), Jim Granville wrote:

>Err, with no assembler, how do you run the compiled cpu ?
Hand-coding. Yup, really. One of the interesting benefits of a super-simple instruction set is that this can be done, for small programs - and small programs is all I've ever run on it (see below).
>Good docs, but missing is a resource report ?
Irrelevant in the target application (teaching HDL syntax and techniques). Around 450-500 logic cells (4-LUT+FF) in typical FPGAs; about 90MHz system clock rate; instructions take between 3 and 7 system clocks to execute if the APB-connected memory has no wait states. The RTL implementation is pretty dumb, and could easily be made much faster and tighter. The only criterion for the present implementation was that the RTL CPU should be synthesisable. [snip sundry interesting comments] But the purpose of this design was to create a piece of Verilog code that does interesting things, could be modified (specifically, have SystemVerilog language features grafted on), and was small enough for students to find the relevant bits easily in a 50-minute lab session. I'm actually working on a real-world version, for my own amusement, but Ye Olde Original does what it aimed to do and I don't plan on fixing it :-) -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
On Thu, 19 Mar 2009 00:57:36 -0700 (PDT), rickman wrote:

>The logical shift left operation is not really needed >unless I am mistaken. Since the opcode is free to specify any >combination of operands, you can use the same value for operand A and >B with an ADD operation which will result in a left shift. Have any >of the students noticed this?
No, but in fairness they would not be expected to; the design is used on language and verification courses, and the CPU architecture is only there to get a wide range of interesting behaviours from a small design. There is an embarrassingly large amount of redundancy and overlap in the instruction set. I'm working on it :-) -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
On Mar 19, 9:00=A0pm, Jonathan Bromley <jonathan.brom...@MYCOMPANY.com>
wrote:
> On Thu, 19 Mar 2009 01:14:31 -0700 (PDT), Jim Granville wrote: > >Err, with no assembler, how do you run the compiled cpu ? > > Hand-coding. =A0Yup, really. =A0One of the interesting > benefits of a super-simple instruction set is that > this can be done, for small programs - and small > programs is all I've ever run on it (see below). > > >Good docs, but missing is a resource report ? > > Irrelevant in the target application (teaching > HDL syntax and techniques).
Wow - now there's a surprising comment. I'd hope any student taught HDL., would also understand WHAT a resource report was, where to find it, and what his HDL could be expected to use. So you don't connect the students to the silicon at all ?
> =A0Around 450-500 > logic cells (4-LUT+FF) in typical FPGAs; > about 90MHz system clock rate; instructions > take between 3 and 7 system clocks to execute > if the APB-connected memory has no wait states. > The RTL implementation is pretty dumb, and > could easily be made much faster and tighter. > The only criterion for the present implementation > was that the RTL CPU should be synthesisable. > > [snip sundry interesting comments] > > But the purpose of this design was to create > a piece of Verilog code that does interesting > things, could be modified (specifically, have > SystemVerilog language features grafted on), > and was small enough for students to find > the relevant bits easily in a 50-minute > lab session. =A0I'm actually working on a > real-world version, for my own amusement, > but Ye Olde Original does what it aimed > to do and I don't plan on fixing it :-)
You could make that an exercise for the student ;) -jg
On Thu, 19 Mar 2009 02:36:17 -0700 (PDT), -jg wrote:

>So you don't connect the students to the silicon at all ?
Yes, obviously; professional training makes no sense unless it's anchored in the real world. We have courses where students download designs to demo boards, and look closely at the impact of VHDL or Verilog coding decisions on implementation. But the specific course for which the CPU design was written is not focused on implementation concerns; the students generally are experienced designers who want to know more about what SystemVerilog can do for them. They're quite grown-up enough to make their own decisions about implementation issues! We have subsequently re-used the design on verification courses, where it's simply a way to get a bunch of interesting and varied activity without excessive complexity. Again, in that context the (poor) efficiency of the design is not relevant to the course content. If a student on a course actually cared about improving the implementation of this design, my colleagues and I would be more than happy to discuss it. But it's a digression. -- Jonathan Bromley, Consultant DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK jonathan.bromley@MYCOMPANY.com http://www.MYCOMPANY.com The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.
"Jonathan Bromley" <jonathan.bromley@MYCOMPANY.com> wrote in message 
news:nd24s4ds3v4ig9hmo49v83iegr26heicqo@4ax.com...
> > There is an embarrassingly large amount of redundancy > and overlap in the instruction set. I'm working on it :-) > -- > Jonathan Bromley, Consultant >
Right. On the soft processor I did, there are 'free' opcodes that are useless, but it would take more fabric to turn them off. By the way, you might wanna stop reading here if you're not interested in 'yet another processor' design! In this hobby processor I designed, I started from what the fabric would easily support. For example, I used 16 16-bit registers because that fits into 16 LUTs. I picked AND/OR/XOR/MOV because that's one LUT per bit. Likewise for shift/rotate left/right. I could use parallel data and opcode fetch because a blockram has two ports. The stack was also a block ram, with one port for registers, one for the PC, but the same pointer connected to both ports. I was particularly pleased to get ADD/SUB/ADC/SBC/INC/DEC/INC C/DEC NC all working in one LUT per bit, albeit by instantiating the carry elements. Maybe the synthesiser would be better these days? In the end, I even added interrupts, which was surprisingly easy as I already had CALL and RET. I just needed a couple of FFs to store the zero and carry flags. The eight to one mux into the register bank used the MUXF5/6 things to source data from the ALU, the logic/shift, stack, lsb mutliplier, msb multiplier, immediate value, RAM or I/O. This FPGA-centric approach meant that the whole thing fits in about 250 LUTs, 2 BRAMS and 1 multiplier thingy, and runs at over 100MHz in a V2PRO which is over 100MIPS because most instructions take one cycle, even relative jumps. Bloody zero flag had the worst timing. Like everyone says, the assembler took me a morning to write in Perl. As for documentation, it's in VHDL. That's it, right? ;-) One interesting thing is that it was easy to let the processor do things like READ R0,[R1++] because I could increment the registers at the same time as using them as indirect addresses. For me the cool part of the exercise was to get as much out of the processor with the least FPGAs resource. (Probably because I cut my teeth on XC2064s and XC3010s.) It's interesting to compare it with Jonathan's processor which had a completely different focus and application. Instantiating carry primitives probably isn't his main teaching objective! Cheers, Syms.